Download - Research Data Management for Econometrics


Econometrics of Panel Data and Network Analysis

Research Data


Module 1Dr. Peter Löwe

Berlin, 03. 08. 2017


1. Why bother: A crisis, horror stories & a Panda-Oncologist

2. Size is relative: Doctor House, Big Data, and a long tail

3. Reality Check: Doing science in the 21st century

4. Research Data Management according to Gollum and XKCD

5. Persistent Identifiers: Digital dog tags for everything and everyone !

6. Research Data Repositories & good reads

7. Conclusion: Culture change & happy Pandas

1 Today‘s menue

• Why Research Data Management matters and how it

should work (perfect world)

• How stuff currently works (state of the art)

• How stuff will work soon (outlook)

• How to get started (self help)

1 Drivers for Research Data Management

Why you should care (internal motivation)

• Increase the efficiency of your research process

• Avoid losing data

• Enable data re-use and sharing

Why you are going care (external motivation)

• Meet the requirements of research funders and your institute

• Comply with the policies of a growing number journal publishers on

making the data underlying publications available

• Increase your visibility (citations)

1 Research Data includes

• Questionnaires/surveys

• Raw experimental data

• Analysed data

• Databases

• Simulations and research code (software)

• Audio-visual materials

• Laboratory and field notes

• Clinical data, including clinical records

• Images and photographs

1 The Research Data Spectrum

• Hand written letters

• Images or photos

• Soil samples

• Tissue samples

• Archeological dig sites

• …..

• Scanned & OCR version

• Scanned digital version

• Analysed result of samples

• Analysed result of samples

• 3D models of the dig site

• …..

Physical Digital

1 Issue: The Reproducibility Crisis

Nature 533, 452–454 (26 May 2016) doi:10.1038/533452a


• A methodological crisis in


• the phrase was coined in the

early 2010s as part of a

growing awareness of the


• 2016: poll of 1,500 scientists

• 70% of them had failed to

reproduce at least one other

scientist's experiment

• results of many scientific

studies are difficult or

impossible to replicate on

subsequent investigation

1 Data Sharing and Management Snafu in 3 Short Acts

[Snafu: „Situation normal, all f***ed up“]

1 Video

1 Discussion

Have you encountered something similar ?

How to deal with such a situation ?

Where do you store your data?

How much data would you lose if your laptop was stolen?

1Reproducibility decreases of time

due to increasing data loss over time

“In their parents' attic, in boxes in the garage, or stored on now-defunct

floppy disks — these are just some of the inaccessible places in which

scientists have admitted to keeping their old research data. Such practices

mean that data are being lost to science at a rapid rate, a study has now


1 Night of the Living Data

1 Self-help Groups

1 Way Out: Keep Science FAIR (perfect world)

Principles to ensure research data is FAIR:

Findable, Accessible, Interoperable, Reusable

“The problem the FAIR Principles address is the lack of widely shared, clearly

articulated, and broadly applicable best practices around the publication of scientific


“FAIRness is a prerequisite for proper data management and

data stewardship”Mark D. Wilkinson et al. The FAIR Guiding Principles for scientific data management and

stewardship, Scientific Data (2016). DOI: 10.1038/sdata.2016.18

Data Storage Evolution

We arehere



2 Life Expectancy of Digital Storage Media

2 Life Expectancy of Digital Storage Media

Storage capacity grows, but not the lifespan

Average life-span: about 10- 30 years

2 Big Data Buzzwords: The Four V‘s

2Size is not everything:Big Data and the Long Tail of Science

Big data from small data:

• Genome studies,

• Remote Sensing

Overall amountcontinues to

increases due to„Big Data“

(Volume | Velocity)

3 Data-driven Science

Paradigms of Science:

1. empirical,

2. theoretical,

3. Computational

4. data-driven

3 The Fourth Paradigm

"It's the data, stupid"

Dr Gray's call-to-arms was [..] “to have a world

in which

• all of the science literature is online,

• all of the science data is online, and they

• interoperate with each other.”

3 Innovation in Science travels at different velocities

4Data Wrangling:Research Data Management (RDM)

Peter Löwe 2017-08-02Research Data Management: Module 127

Transfer Transfer Publication









Gliederung des Data Curation Continuum in vier Verantwortungsdomänen.. Im Prozess des

Datentransfers werden die vorliegenden Metadaten um weitere Elemente angereichert.

(Nach Klump, 2009)

Post ResearchPre Research


4 Pre Research: Institutional Requirements

Peter Löwe 2017-08-02Research Data Management: Module 128

Institutional Policy and Procedures

Support services - people and other means of providing advice

and support

IT Infrastructure - the hardware, software and other


Metadata management - so that data records can be meaningful

and fit for purpose

Institutional Data Management Framework

4 Pre Research: Data Management Plan (perfect world)

Peter Löwe 2017-08-02Research Data Management: Module 129

data organisation and storage;

metadata standards and guidelines;


archiving for long-term preservation;

version control and derived data products;

data sharing or publishing intentions, including licensing;

ensuring security of confidential data;

data synchronisation; and

governance, roles and responsibilities.

4 Documentation 101

Peter Löwe 2017-08-02Research Data Management: Module 130

a) Document your data sets.

b) Ask your data repository how to document correctly (Metadata !)

c) If you do not document, you‘re wasting an opportunity to receive credit

by citation and reuse

d) Not to be missed:

Topic (keywords, controleld vocabulary, abstract)

Observation unit (counties, people, etc)

Database (random sampling, complete survey, etc.)

Sampling method


Access: Limitations, embargo, POC

4 Metadata 101

Peter Löwe 2017-08-02Research Data Management: Module 131

Metadata (structured data about the data)

• Who collected the data?

• Who funded the research project?

• When (and where) was it collected?

• Instruments and setting for collecting the data?

• Title of the dataset

• Methods used to process the data

• Etc. etc.

4 Appropriate File Formats

Peter Löwe 2017-08-02Research Data Management: Module 132

• Open and non-proprietary

• Human readible, non-binary

• Patent-free

• ISO-standards

• textual data: XML, TXT, HTML, PDF/A (Archival PDF)

• Tabular data (spreadsheets): CSV

• Databases: XML, CSV

• Images: TIFF, PNG, JPEG*

• Audio: FLAC, WAV, MP3

4 Include a Manifest / readme File !

Peter Löwe 2017-08-02Research Data Management: Module 133

4 Data Life Cycle: Personal Domain Perspective

Peter Löwe 2017-08-02Research Data Management: Module 134

Most critical stage in the research

data lifecycle is the completion of

the research project. In the most

cases there is no follow up funding

to maintain the research data. Also,

the scientist has to focus on the

next project.


4 Publishing and Sharing Data

Peter Löwe 2017-08-02Research Data Management: Module 135

Publishing and Sharing data ≠ Open Access to data

• “Open” and “Closed” are relative concepts.

• “Closed” ≈ conditional access based on individual


• “Closed” ≈ conditional access based on roles

Metadata Research Data

Open Open

Open Closed

Closed Open

Closed Closed

4 Continual data curation across domains

Peter Löwe 2017-08-02Research Data Management: Module 136

4 Data Curation Continuum: Visibility und Circulation

Peter Löwe 2017-08-02Research Data Management: Module 137

Transfer Transfer Publication













4 Data Delay Strategies ?

Peter Löwe 2017-08-02Research Data Management: Module 138

4 The Grant Cycle according to XKCD (and Machiavelli ?)

4 The Reputation Economy

Peter Löwe 2017-08-02Research Data Management: Module 140

Open Access to Data:• Science has become a reputation economy

• The fundamental difference between disciplines is the trade-off between reputation

and collaboration at points of the reputation economy where changes in the form of

capital occur.

• Sharing data as a form of collaboration must be balanced by a similar gain in


• […]collaborative disciplines enforce data sharing as a social norm where non-

compliance will result in some form of penalty […]

4Research Parasites Paradigm:

Open Access for Data is evil

Peter Löwe 2017-08-02Research Data Management: Module 141


Lego Gollum

4Alternative Paradigm:

Sharing the fire of the Open Data „torch“

Peter Löwe 2017-08-02Research Data Management: Module 142

4A Solution for the CrisisOpen Science enables Reproducible Science

Peter Löwe 2017-08-02Research Data Management: Module 143




• Greater availability

and accessibility of

publicly funded

scientific research


• Possibility for

rigorous peer-review


• Greater

reproducibility and

transparency of

scientific works;

• Greater impact of

scientific research.

Open Science is the

movement to make

scientific research

and data accessible

to all

4 Reality check: Gollum (still) beats Prometheus by 10:1

Peter Löwe 2017-08-02Research Data Management: Module 144


• Gift culture still prevails

• It‘s not the technology

• It‘s not the generational change

• How to trigger cultural change ?

Science Technology Medicine (STM):

2006-2016: ~ 30 million papers published

~ 3 million data publications

(Klump 2017)


4Pradigm Change induced by Funding Agencies:Watering hole approach instead of stick & carrot

Peter Löwe 2017-08-02Research Data Management: Module 145

Carrot & stick did not work

Control the watering hole: Works (for now)

4 FAIR principles: As guidelines

Peter Löwe 2017-08-02Research Data Management: Module 146


“The problem the FAIR Principles address

is the lack of widely shared, clearly

articulated, and broadly applicable best

practices around the publication of

scientific data”

5 Technical Requirement for FAIR

Peter Löwe 2017-08-02Research Data Management: Module 147

• Easy and permanent access to

research data via the internet

• Enhanced discovery, retrieval

and management of data to

enable data reuse and

verification of research results

5 Benefits of Citation

Peter Löwe 2017-08-02Research Data Management: Module 148

• Including citable data in related publications increases

the citation rate of those publications

• Only cited data can be counted and tracked (in a similar

manner to journal articles) to measure impact

• Routine citation of data will assist in gaining

acknowledgement of data as a first class research output

• Citations for published data can be included in CVs along

with journal articles, reports and conference papers

5Technical Challenge: Unbreakable internet-based Citation

Peter Löwe 2017-08-02Research Data Management: Module 149

Stable linking needed

• Data will move, URL links to Webpages will break.

• Unbreakable alternative needed !

5 Digital Object Identifiers (DOI)

Peter Löwe 2017-08-02Research Data Management: Module 150

• International DOI Foundation was founded in 1998.

• The DOI system offers long-term persistence and

accessibility of data.

• Based on the Handle system.

• In May 2012 the DOI System ISO Standard 26324 was


• Part of the quality control is mandatory metadata for

each object registered with a DOI.

5 What is a DOI ?

Peter Löwe 2017-08-02Research Data Management: Module 151

DOI: Acronym for "digital object identifier“.

A DOI name is an identifier (not a location) of an entity on digital


What you see: alphanumeric string (never changes)

Associated with: location (such as URL)

Accompanied with: who, what, when… (metadata)

5DataCite Metadata SchemaMandatory properties

Peter Löwe 2017-08-02Research Data Management: Module 152

Part of the quality control is mandatory metadata for each

object registered with a DOI:

• Identifier (with type attribute)

• Creator (with type and nameIdentifier attributes)

• Title (with optional type attribute)

• Publisher

• PublicationYear

5 DOI is a quality label for data

Peter Löwe 2017-08-02Research Data Management: Module 153

Datasets with a DOI have to be:

Stable (i.e. not going to be modified)

Complete (i.e. not going to be updated)

Permanent – by assigning a DOI we’re committing to make

the dataset available for posterity

Good quality – by assigning a DOI its receiving the data

centre’s stamp of approval, saying that it’s complete and all

the metadata is available

DOI:Seal of


5 DOI for Research Data

Peter Löwe 2017-08-02Research Data Management: Module 154

5 DOI Citation Examples

Peter Löwe 2017-08-02Research Data Management: Module 155

Fahrenberg, Jochen (2010): Freiburger Beschwerdenliste FBL. Primärdaten der

Normierungsstichprobe 1993. Version 1.0.0. ZPID- Leibniz-Zentrum für Psychologische

Information und Dokumentation.


Rattinger, Hans; Roßteutscher, Sigrid; Schmitt-Beck, Rüdiger; Weßels, Bernhard(2012):

Wahlkampf-Panel (GLES 2009). Version: 3.0.0. GESIS Datenarchiv.


Schupp, Jürgen; Kroh, Martin; Goebel, Jan; Bartsch, Simone; Giesselmann, Marco et.

al. (2013): Sozio-oekonomisches Panel (SOEP), Daten der Jahre 1984-2012. Version: 29.

SOEP- Sozio-oekonomisches Panel.

Dataset. doi:10.5684/soep.v29.

5 DOI System Architecture

Peter Löwe 2017-08-02Research Data Management: Module 156

5 DataCite Services

Peter Löwe 2017-08-02Research Data Management: Module 157

5 Upcoming: Search DOI-registered datasets by ORCID

Peter Löwe 2017-08-02Research Data Management: Module 158

Find any DOI-registered publication by ORCID

Example: Löwe / Loewe / Lowe ?

Which of the four Peter Löwe ?

6 Data Curation Continuum: Research Data Repositories

Peter Löwe 2017-08-02Research Data Management: Module 159

Transfer Transfer Publication













6 re3data: Registry of Research Data Repositories

Peter Löwe 2017-08-02Research Data Management: Module 160

1,500 research dara repositories

described by tags:

6 re3data: Search options

Peter Löwe 2017-08-02Research Data Management: Module 161

6 Research Data Repository (RDR) Development and Services

Peter Löwe 2017-08-02Research Data Management: Module 162

Currently, DFG funds two RDR-related Projects:

1. SowiDataNet: addressing the social sciences

2. RADAR: addressing the long tail of Science

Technology and Metadata are compatible.

RADAR is a service offering by FIZ Karlsruhe (testing phase)

Near future:

• SowiDtaaNet will become a serice offering (GESIS)

• Datorium will merge with SowiDataNet

6 RADAR: Research Data Repository Services

Peter Löwe 2017-08-02Research Data Management: Module 163

Van den Broel K, Furtado F, Engel T (2015): RADAR – A Research Data Repository for the “Long-Tail of Science”

6RADAR: Research Data Repositories Roles & Responsibilities

Peter Löwe 2017-08-02Research Data Management: Module 164 Repository for Social Science andEconomic Science

Peter Löwe 2017-08-02Research Data Management: Module 165

6 Datorium: Data Set Description

Peter Löwe 2017-08-02Research Data Management: Module 166

6 Datorium: Terms of Access

Peter Löwe 2017-08-02Research Data Management: Module 167

4 Where NOT to „publish“ your Data

Peter Löwe 2017-08-02Research Data Management: Module 168


Professional repositories which enable

• long term access,

• search,

• retrieval,

• thorough metadata

6Alternative (Self help): All-purpose Repositories

Peter Löwe 2017-08-02Research Data Management: Module 169

Rueda, Laura. (2017, May). Introduction to DataCite. Zenodo.

6 OPENAIRE: RDM on the European Level

Peter Löwe 2017-08-02Research Data Management: Module 170


6 Adoption of Open Science in Europe

Peter Löwe 2017-08-02Research Data Management: Module 171

6Forschungsdaten in den Sozial- und Wirtschaftswissenschaften

Peter Löwe 2017-08-02Research Data Management: Module 172

6 Handbuch Forschungsdatenmanagement

Peter Löwe 2017-08-02Research Data Management: Module 173

ISBN 978-3-88347-283-6 PDF:

6 Rat für Sozial- und Wirtschaftdaten / DFG

Peter Löwe 2017-08-02Research Data Management: Module 174



Peter Löwe 2017-08-02Research Data Management: Module 175


Peter Löwe 2017-08-02Research Data Management: Module 176

6 Data Carpentry Workshops

Peter Löwe 2017-08-02Research Data Management: Module 177


Peter Löwe 2017-08-02Research Data Management: Module 178

7 Wise Advise

Peter Löwe 2017-08-02Research Data Management: Module 179


Mistakes I’ve made as an early career researcher

APRIL 5, 2016

Nicola Hemmings (post-doc, University of Sheffield)

Failing to organise my data adequately (circa 2007).

“Prepare your datasets like you would if you were giving them to a

stranger who knew nothing about them. Label, annotate and

meticulously file your R scripts. Incorporate read-me files into everything

and write them for the monkey that will be you in five years, when you

return to your data and/or analyses for some unforeseen but vitally

important reason. Don’t get this wrong. You will regret it.“

7Back to the start:Snafu ? Things are getting better

Peter Löwe 2017-08-02Research Data Management: Module 180

• This film is scientific nontextual information

• It is available on the AV-portal of TIB Hannover, a data portal for

scientic audiovisual content.

• DOI-link:

Vielen Dank für Ihre Aufmerksamkeit.

DIW Berlin — Deutsches Institut

für Wirtschaftsforschung e.V.

Mohrenstraße 58, 10117 Berlin

RedaktionPeter Löwe ([email protected])

Based on the works of

• Paul Wong (2017) ANDS,Research Integrity Advisor Data Management Workshop

• 3TU.Datacentre (2014): Data citation and DOIs

• and others

