RDA, Data Citation, and PIDs for DataOne

Post on 15-Apr-2017

73 views 0 download

Transcript of RDA, Data Citation, and PIDs for DataOne

Unless otherwise noted, the slides in this presentation are licensed by Mark A. Parsons under a Creative Commons Attribution-Share Alike 3.0 License

Building Collaborative Bridges Opportunities and Challenges for Data Sharing and Citation

Mark A. Parsons0000-0002-7723-0950Secretary General

DataONE Webinar10 May 2016

All of society’s grand challenges require diverse

(often large) data to be shared and integrated

across cultures, scales, and technologies.

Research Data Alliance

Vision Researchers and innovators openly share data across technologies, disciplines, and countries to address the grand challenges of society.

Mission RDA builds the social and technical bridges that enable open sharing of data.

Infrastructure is

Relationships, interactions, and connections between people, technologies, and institutions

FranBerman,ResearchDataAlliance

“Create - Adopt - Use” (in 12-18 months)

Systems Interoperability

Adopted Policy

Sustainable Economics

Common Types, Standards, Metadata

TrafficImage:MikeGonzalez

Adopted Community Practice

Training, Education, Workforce

Shared Principles

• Openness

• Consensus

• Balance

• Harmonization

• Community Driven

• Non-profit

May-July Aug-Oct Nov-Jan Feb-Apr May-July Aug-Oct Nov-Jan Feb-Apr May-July Aug-Oct Nov-Jan Feb-Apr

392

9911274

16562048

24042636

28813126

34343698

3976

SouthAmerica1%

NorthAmerica34%

Europe48%

Australasia5%

Asia9%

Africa3%

from 110 countries

https://rd-alliance.org/about-rda/who-rda.html

TheRDACommunity:~4,000+membersfrom110countries

(April2016)

70+ Working and Interest Groups

RDA Organisational Members

Organisational & Affiliate Members

RDA Affiliate Members

https://rd-alliance.org/organisation/rda-organisation-affiliate-members.html

FranBerman,ResearchDataAlliance

RDA: Accelerate Data Sharing and Interoperability Across Cultures, Communities, Scales, Technologies

▪ Technicalpartsofthedataengine:▪ Datatyperegistriesreferencemodel▪ Wheatdatainteroperabilityframework

▪ Rulesoftheroad:▪ Commonagreementondatacitation▪ Commonpracticefordatarepositories▪ Principlesoflegalinteroperability

▪ Betterdrivers• Summerschoolsindatascienceandcloud

computinginthedevelopingworld(withCODATA)

• Activedatamanagementplandevelopmentandmonitoring

Policy and Practice

Systems Interoperability

Sustainable Economics

Common Types, Standards, Metadata

Training, Education, Workforce

Some themes amidst the difference

1. Persistent Identifiers for data, documents, people, organisations, instruments—Everything!

2. Certifying Trust in assertions, evidence, organisations, processes…

3. The value of Conversations, Relationships, and Mediation — an agile network effect.

‹#›An Area of Convergence and Agreement

Internet Domain

nodes with IP numbers

packages being exchanged

standardized protocols

Slide courtesy P. Wittenberg from L. Lannom from D. Clark

‹#›An Area of Convergence and Agreement

Internet Domain

nodes with IP numbers

packages being exchanged

standardized protocols

Data Domain

objects with PID numbers

objects being exchanged

standardized protocols

Slide courtesy P. Wittenberg from L. Lannom from D. Clark

Purpose of Data Citation

• Aid scientific reproducibility through direct, unambiguous connection to the precise data used.

• Credit for data authors and stewards • Accountability for creators and stewards • Track impact of data set • Help identify data use (e.g., trackbacks)

• Data authors can verify how their data are being used. • Users can better understand the application of the data.

• A locator/reference mechanism not a discovery mechanism per se

Crisis of Confidence in Research Data Citation

The Evolution of Data Citation

• Data was part of the literature—tables, maps, monographs, etc.—and we cited accordingly. (Some data were still hoarded).

• Digital data becomes the norm. It’s messier and we forget how to do cite it routinely.

• Initial efforts to define digital data citation in the 90s - early 00s • Right idea, little traction • Partially conflated with the citing URLs issue

• A blossoming in the mid-late 00s. • Multiple disciplines start developing approaches and guidelines • DOI a big driver, especially for DataCite, but other identifiers used too

(Handles, LSIDs, UNFs, ARKs and good ol’ URI/Ls) • A somewhat competitive atmosphere

• Finally consensus through the Joint Declaration of Data Citation Principles, 2013

JointDeclarationofDataCitationPrinciples(Overview)

TheNobleEight-FoldPathtoCitingData

1. Importance2. Creditandattribution3. Evidence4. UniqueIdentification5. Access6. Persistence7. Specificityandverifiability8. Interoperabilityandflexibility

Principlesaresupplementedwithaglossary,referencesandexampleshttp://force11.org/datacitation

‹#›Citing Dynamic Data

Data Citation: Data + Means-of-access

▪ Data à time-stamped & versioned (aka history)

Researcher creates working-set via some interface: ▪ Access à assign PID to QUERY, enhanced with − Time-stamping for re-execution against versioned DB − Re-writing for normalization, unique-sort, mapping to history − Hashing result-set: verifying identity/correctness

leading to landing page

S. Pröll, A. Rauber. Scalable Data Citation in Dynamic Large Databases: Model and Reference Implementation. In IEEE Intl. Conf. on Big Data 2013 (IEEE BigData2013), 2013http://www.ifs.tuwien.ac.at/~andi/publications/pdf/pro_ieeebigdata13.pdf

‹#›

Output / Results http://bit.ly/1T1HHXI

▪ 14 Recommendationsgrouped into 4 phases: - Preparing data and query store - Persistently identifying specific data

sets - Resolving PIDs - Upon modifications to the data

infrastructure ▪ Still open for comment by

members ▪ See RDA Magazine for

overview and adoption cases ▪ Reference implementations

(SQL, CSV, XML) ▪ Pilots

Getting involved

Individuals✓Observers✓Contributors✓Drivers

22

Organisations✓ Insight✓ Adopt✓ Drive

Nationallevel✓ Coordination&Knowledge

Exchange,Strategy&/orImplementation

• Members• WGs-IGs-BoFs• Requestsfor

Comments• Plenaries

• Member• WGs-IGs-BoFs• RfCs• Fundedprojects• Adoption/Uptake

• Papers&Events• Meetings&Fora• Training&Workshops• Uptakepilots

https://rd-alliance.org/get-involved.html

12-16 September 2016in

Denver, Colorado, USA

Info: enquiries@rd-alliance.org

@resdatall

25RDA Interest (IG) and Working Groups (WG) by Focus 1 — February 2016

26RDA Interest (IG) and Working Groups (WG) by Focus 2 — February 2016