Or 2013-abrams-sharing-data-rich-research

37
Sharing Data-Rich Research Through Repository Layering Stephen Abrams California Digital Library Angela Rizk-Jackson Julia Kochi University of California, San Francisco Noah Wittman University of California, Berkeley

description

Merritt’s micro-services-based architecture provides a number of options for easy integration with diverse external discovery services with specific disciplinary focus on scientific data sharing. By removing many of the barriers faced by researchers interested in data publication, the integrations of Merritt with DataShare and Research Hub exemplify a new service model for cooperative and distributed data sharing. The widespread adoption of such sharing is critical to open scientific inquiry and advancement.

Transcript of Or 2013-abrams-sharing-data-rich-research

Page 1: Or 2013-abrams-sharing-data-rich-research

Sharing Data-Rich Research Through Repository Layering

Stephen AbramsCalifornia Digital Library

Angela Rizk-JacksonJulia Kochi

University of California, San Francisco

Noah WittmanUniversity of California, Berkeley

Page 2: Or 2013-abrams-sharing-data-rich-research

Why is data curation important?

Accelerating scientific progress Enabling appropriate scrutiny and verification of results Promoting integrity and debate Facilitating new collaborations Avoiding needless duplication of effort Increasingly, complying with institutional policies, publication

requirements, and funder mandates

Cf. White and Teds (2011), “Making the case for research data management” DCC briefing paper, www.dcc.ac.uk/resources/briefing-papers/making-case-rdm

Page 3: Or 2013-abrams-sharing-data-rich-research

The library’s role

A continuation of its long-standing mission and practice to connect patrons with content of interest in meaningful ways across barriers of space and timeCf. Tenopir et al. (2012), “Academic librarians and research data services: Preparation and attitudes,” 78th

IFLA General Conference and Assembly, Helsinki, conference.ifla.org/past/ifla78/116-tenopir-en.pdf

Offering solutions that enhance the natural points of alignment between the scholarly research and information lifecycles

Publish

Reuse

ShareCreate

Discover

Collect

PreserveAccessResearchResearch CurationCuration

Scholarly lifecycle Information lifecycle

Page 4: Or 2013-abrams-sharing-data-rich-research

Merritt

Curation repository available to the UC community and external partners Preservation and access Content agnostic, model free Highly decentralized micro-services architecture

Cf. Abrams, Cruse, Kunze, and Minor (2011), “Curation micro-services: A pipeline metaphor for repositories,” Journal of Digital Information 12(2), journals.tdl.org/jodi/article/view/1605

26 curatorial units 271 collections 325,000 objects 450,000 versions 4,500,000 files 13 TB

www.cdlib.org/uc3/merrittmerritt.cdlib.org

Page 5: Or 2013-abrams-sharing-data-rich-research

Merritt

Storage nodeStorage broker

Inventory

ONEShare UNM storage node

Storage node

UI/API

UI/API

UI/API

LDAP

LDAP

LDAP

RDBMS

Fixity

User agent

Message queue

RDBMS

Load balancer

Ingest

Load balancer

Ingest

Ingest

EZID

No-SQL

DataCite

DataONE member node

RDBMS

RDBMS

DataONEcoord’ing node

IDF

Load balancer

Web of Knowledge

Primo

SAN

SDSC cloud

Page 6: Or 2013-abrams-sharing-data-rich-research

(Some) issues to address

Scale Individual objects ranging from 0 to 47,000 files Individual files ranging from 0 to 14 GB

Maintaining control Concern over potential loss of control over dissemination and

use of data

User experience Switch from organizational to individual interaction

www.flickr.com/photos/vixon/116447718www.flickr.com/photos/traftery/4319529821www.flickr.com/photos/32195273@N05/51076852642

Page 7: Or 2013-abrams-sharing-data-rich-research

(Some) issues to address

Scale Individual objects ranging from 0 to 47,000 files Individual files ranging from 0 to 14 GB

Maintaining control Concern over potential loss of control over dissemination and

use of data

User experience Switch from organizational to individual interaction

Augment repository function by composition (when possible) and addition (when necessary) Loosely-coupled integration with external community supported

systems and services

Page 8: Or 2013-abrams-sharing-data-rich-research

Scale

Avoiding client timeout ≤ 2 GB: File-based stream-based AIP-to-DIP processing > 2 GB: Asynchronous delivery

Email notification with personalized, time-limited URL

Streamlined storage provisioning SDSC cloud

cloud.sdsc.edu

www.kevatron.co.uk/converting-8-24-bit-samples-in-coreaudio-on-ios www.flickr.com/photos/paulbhartzog/680749585

Page 9: Or 2013-abrams-sharing-data-rich-research

Control

Data use agreements (DUAs) Explicit assertion of license requirements and terms of use Curatorial and consumer notification of acceptance

Cf. Brazhnik and Jones (2007), “Anatomy of data integration,” Journal of Biomedical Informatics 40(3): 252-69, doi:10.1016/j.jbi.2006.09.001

From: [email protected]: Merritt DUA acceptance

Name: Stephen AbramsAffiliation: California Digital LibraryCollection: UCSF DataShareObject: Frontotemporal Lobar Degeneration (FTLD)Date: 2013-05-31 09:50:34 PDTTerms of use: As part of this agreement, Consumer submits to the following statements: (1) I will receive access to de-identified data and will not attempt to establish the

identity of any of the study subjects.(2) I will share these data only with my immediate co-workers, and I will not transfer

these data to other research groups. I understand that these data are available to other research groups through the process by which I obtain them.

(3) I will require anyone in my group who utilizes these data, or anyone with whom I share these data to comply with this data use agreement

...

Page 10: Or 2013-abrams-sharing-data-rich-research

User experience

Due to its open eligibility policy, Merritt will always provide a more generic UX than special-purpose or disciplinary systems

Shifting user roles, shifting expectations Institutional individual researcher Behavioral expectations set by the commercial/mobile web

Page 11: Or 2013-abrams-sharing-data-rich-research

User experience

Due to its open eligibility policy, Merritt will always provide a more generic UX than special-purpose or disciplinary systems

Shifting user roles, shifting expectations Institutional individual researcher Behavioral expectations set by the commercial web

Integration with extant services that better provide the desired UX DataShare

Research Hub

Page 12: Or 2013-abrams-sharing-data-rich-research

DataShare

“The goal of the DataShare project is to catalyze widespread sharing of scientific research data”datashare.ucsf.edu

UCSF Clinical and Translational Science Institutectsi.ucsf.edu

UCSF Librarywww.library.ucsf.edu

UCSF Center for Imaging of Neurodegenerative Diseasewww.radiology.ucsf.edu/cind

Architecture DataShare submission client (Ruby/Rails)

Merritt curation repository DataShare discovery portal (XTF/Java)

Page 13: Or 2013-abrams-sharing-data-rich-research

DataShare

Prepare Describe Upload Curate Discover Share

Page 14: Or 2013-abrams-sharing-data-rich-research

DataShare

Prepare Best practice advice

Describe Upload Curate Discover Share

Page 15: Or 2013-abrams-sharing-data-rich-research

DataShare

Prepare Describe

Schema-directedmetadata editor

DataCite schemaschema.datacite.org

Upload Curate Discover Share

Page 16: Or 2013-abrams-sharing-data-rich-research

DataShare

Prepare Describe Upload

File browse ordrag-n-drop

Curate Discover Share

Page 17: Or 2013-abrams-sharing-data-rich-research

DataShare

Prepare Describe Upload

File browse ordrag-n-drop

Curate Discover Share

Page 18: Or 2013-abrams-sharing-data-rich-research

DataShare

Prepare Describe Upload Curate

Manage datasets

Discover Share

Page 19: Or 2013-abrams-sharing-data-rich-research

DataShare

Prepare Describe Upload Curate Discover

Faceted search andbrowse

Share

Page 20: Or 2013-abrams-sharing-data-rich-research

DataShare

Prepare Describe Upload Curate Discover Share

DataONE DataCite (soon) Primo

Web of Knowledge SEO

Page 21: Or 2013-abrams-sharing-data-rich-research

Merritt + DataShare

Storage nodeStorage broker

Inventory

ONEShare UNM storage node

Storage node

UI/API

UI/API

UI/API

LDAP

LDAP

LDAP

RDBMS

Fixity

User agent

Message queue

RDBMS

Load balancer

Ingest

Load balancer

Ingest

Ingest

EZID

No-SQL

DataCite

DataONE member node

RDBMS

RDBMS

DataONEcoord’ing node

IDF

Load balancer

Web of Knowledge

Primo

SAN

SDSC cloud

DataShare upload

Collection Atom feed

XTF xtf.cdlib.org

DataShare portal

Lucene

Page 22: Or 2013-abrams-sharing-data-rich-research

Research Hub

“Research Hub provides powerful tools for content management and collaboration”hub.berkeley.edu

Alfresco CMSwww.alfresco.com

770 projects, 3,900 users Personal file management Project collaboration Departmental resource pooling Research data management

Desktop sync, mobile app, Adobe Creative Suite

UC Berkeley Information Services and Technologyist.berkeley.edu

Page 23: Or 2013-abrams-sharing-data-rich-research

Research Hub

Prepare Acquire and

arrange

Describe Upload Curate Discover Share

Page 24: Or 2013-abrams-sharing-data-rich-research

Research Hub

Prepare Describe

Schema-directedmetadata editors

Upload Curate Discover Share

Page 25: Or 2013-abrams-sharing-data-rich-research

Research Hub

Prepare Describe Upload

Direct action

Curate Discover Share

Page 26: Or 2013-abrams-sharing-data-rich-research

Prepare Describe Upload

Direct action

Curate Discover Share

Research Hub

Page 27: Or 2013-abrams-sharing-data-rich-research

Research Hub

Prepare Describe Upload

Policy-based workflow rules

Curate Discover Share

Page 28: Or 2013-abrams-sharing-data-rich-research

Research Hub

Prepare Describe Upload

Drag-and-drop

Curate Discover Share

Page 29: Or 2013-abrams-sharing-data-rich-research

Research Hub

Prepare Describe Upload

Confirmation

Curate Discover Share

Page 30: Or 2013-abrams-sharing-data-rich-research

Research Hub

Prepare Describe Upload Curate

Manage datasets

Discover Share

Page 31: Or 2013-abrams-sharing-data-rich-research

Research Hub

Prepare Describe Upload Curate Discover Share

Page 32: Or 2013-abrams-sharing-data-rich-research

Research Hub

Prepare Describe Upload Curate Discover Share

Page 33: Or 2013-abrams-sharing-data-rich-research

Merritt + DataShare + Research Hub

Storage nodeStorage broker

Inventory

ONEShare UNM storage node

Storage node

UI/API

UI/API

UI/API

LDAP

LDAP

LDAP

RDBMS

Fixity

User agent

Message queue

RDBMS

Load balancer

Ingest

Load balancer

Ingest

Ingest

EZID

No-SQL

DataCite

DataONE member node

RDBMS

RDBMS

DataONEcoord’ing node

IDF

Load balancer

Web of Knowledge

Primo

SAN

SDSC cloud

DataShare upload

Collection Atom feed

XTF xtf.cdlib.org

DataShare portal

Lucene

Research Hub

Page 34: Or 2013-abrams-sharing-data-rich-research

Next steps

Self-service account registration UCTrust and InCommon

Shibboleth federations

Additional cloud-based replication

Outreach

Integration with Open Context archaeological portalopencontext.org

Atom-based submission

Integration with Nuxeowww.nuxeo.com

UC system-wide DAMS solution

Integration with Islandoraislandora.ca

Collaboration with UCLA Library Tuque API

Integration with DPNwww.dpn.org

Page 35: Or 2013-abrams-sharing-data-rich-research

Sharing research through repositories

Conform to institutional policy, publication requirements, and funder mandates

Pro-active curation of valuable research outputs Stable citation and access High visibility publication and discovery Use metrics

Page 36: Or 2013-abrams-sharing-data-rich-research

Sharing research through repositories

Conform to institutional policy, publication requirements, and funder mandates

Pro-active curation of valuable research outputs Stable citation and access High visibility publication and discovery Use metrics Repository layering as an appropriate division of labor

Exploiting existing capabilities already in local use

Page 37: Or 2013-abrams-sharing-data-rich-research

For more information Merritt

www.cdlib.org/uc3/[email protected] Abrams David LoyPatricia Cruse Mark ReyesShirin Faenza Joan StarrScott Fisher Carly StrasserErik Hetzner Marisa StrongJoshua Hubbard Bhavitavya VedulaGreg Janée Kenneth WeissJohn Kunze Perry WilletRosalie Lack

DataSharedatashare.ucsf.eduGeoffrey Boushey Julia KochiAnirvan Chatterjee Angela Rizk-JacksonManinder Kahlon Michael Weiner

Research Hubhub.berkeley.eduIan Crew Michael McCarthy (Tribloom)Noah WittmanPatrick McGrath

www.slideshare.net/UC3/or-2013abramssharingdatarichresearch