(Toward) Making Data Management Easy

Post on 21-Jan-2015

4.149 views 2 download

Tags:

description

Data Management Presentation at ALA Annual to ACRL STS Hot Topics mtg

Transcript of (Toward) Making Data Management Easy

Making Data Management

Easy A L A A n n u a l 2 0 1 1

J o a n S t a r rU n i v e r s i t y o f C a l i f o r n i a C u r a ti o n C e n t e r

C a l i f o r n i a D i g i t a l L i b r a r y

Toward…

STS Programs are sponsored by:

HOT TOPICS DISCUSSION GROUP

• Introductions• The research life cycle• Some examples from CDL/UC3 (curation micro-

services and more!)• …with a focus on EZID• Discussion/Questions

California Digital Library (CDL)

University of California Curation Center, California Digital Library

Research has a life cycle.

Judy Baxter, http://www.flickr.com/photos/judybaxter/9825836/

DISCOVERSHARE

CREATE

GATHER

PUBLISH

PRESERVE

ACCESS

COLLECT

Librarians can jump in at any point.

Ims.photo: http://www.flickr.com/photos/bigblackbox/4805557065/

TOOLS & SERVICES• To enable data

preservation• To bake data curation

into data creation• To enhance data sharing,

collecting and gathering• To facilitate data publication

PARTNERSHIPS• To promote data discovery and access• To help researchers comply with new requirements

What this means for Data Management

DISCOVERSHARE

CREATE

GATHER

PUBLISH

PRESERVE

ACCESS

COLLECT

Judy Baxter, http://www.flickr.com/photos/judybaxter/9825836/

TOOLS & SERVICES• Micro-services & Merritt• DCXL• WAS• Data Paper model• EZID

PARTNERSHIPS• DataONE & DataCite• Data Management Plan Tool

Examples from CDL & UC3

DISCOVERSHARE

CREATE

GATHER

PUBLISH

PRESERVE

ACCESS

COLLECT

Judy Baxter, http://www.flickr.com/photos/judybaxter/9825836/

Curation Micro-services

Individualsmall & self-containedcomponentsin custom combinationscan solve complex problems.

photo by Joan Starr

• Persistent identifiers• Persistent storage• Fixity• Replication• Characterization• Discovery• Transformation• Notification• Annotation

Building blockshttps://confluence.ucop.edu/display/Curation/Home

Windell Oskay: http://www.flickr.com/photos/oskay/265899811

• Persistent identifiers• Persistent storage• Fixity• Replication• Characterization• Discovery• Transformation• Notification• Annotation

Version 2

Merritt is: Micro-services “Off the Shelf”

http://www.cdlib.org/services/uc3/merritt

EZIDCAN/Pairtree/Dflat/ReDDFixityReplicationJHOVE2XTF

Merritt repository

Preservation back-end for existing discovery services

Dark archive for preservation masters

Integration with distributed data gridsBright archive for

preservation and end-user access

TOOLS & SERVICESMicro-services & Merritt• DCXL• WAS• Data Paper model• EZID

PARTNERSHIPS• DataONE & DataCite• Data Management Plan Tool

PRESERVE

Judy Baxter, http://www.flickr.com/photos/judybaxter/9825836/

From CDL/UC3

WHY EXCEL?

• CON: poor feature set and scalability compared to DBMSs

• PRO: ubiquity, familiarity, ease-of-use

DCXL: Data Curation Excel

Cody Simms: http://www.flickr.com/photos/jcodysimms/246023851

What an Excel add-in could do

• Permit standardized column headers• Versioning and standard date formats• Auto-archiving and persistent id assignment• “Speed bumps” to discourage macros et al.

• NOTE: This will be released as OPEN SOURCE!

TOOLS & SERVICESMicro-services & MerrittDCXL• WAS• Data Paper model• EZID

PARTNERSHIPS• DataONE & DataCite• Data Management Plan Tool

From CDL/UC3

CREATE

PRESERVE

Judy Baxter, http://www.flickr.com/photos/judybaxter/9825836/

Web Archiving Service snapshot

Stats: Since January 200721 organizations using service4,681 sites captured44,468 captures run26.4 terabytes100 + archives under construction35 archives published

In partnership with the IIPC consortium of national libraries.

Archiving the Gulf oil spillImproving support for collaboration

946 sites8,400 + captures1.3 TBBegan May 5

TOOLS & SERVICESMicro-services & MerrittDCXLWAS• Data Paper model• EZID

PARTNERSHIPS• DataONE & DataCite• Data Management Plan Tool

From CDL/UC3

SHARE

CREATE

GATHERPRESERVE

COLLECT

Judy Baxter, http://www.flickr.com/photos/judybaxter/9825836/

The Data Paper Model

• Minimal: a cover sheet and a set of links to archived artifacts

• Best practice: citation elements (including persistent identifier)

Kevin Steele: http://www.flickr.com/photos/kevinsteele/20631162 /

The Data Paper Model

1. Cover sheet with citation data

2. title, date, authors, abstract, and persistent identifier (DOI, ARK, etc.)

• A data journal– Incorporation of elements to enrich discovery, re-use,

and archiving

– Discipline specific

– Peer reviewed

The Data Paper Model

TOOLS & SERVICESMicro-services & MerrittDCXLWASData Paper model• EZID

PARTNERSHIPS• DataONE & DataCite• Data Management Plan Tool

From CDL/UC3

SHARE

CREATE

GATHER

PUBLISH

PRESERVE

COLLECT

Judy Baxter, http://www.flickr.com/photos/judybaxter/9825836/

An article about data, but no data

FTP site

And then the hunt for the data…

University of California Curation Center, California Digital Library

The EZID difference: data linked…

University of California Curation Center, California Digital Library

…to the scholarly publication

• Create a persistent identifier: DOI or ARK• Add object location• Add metadata• Update object location• Update object metadata

Meeting researcher needs

• Early in the research life cycle• Working on a federated team• Making a career move• Meeting funder requirements

Data-intensive research Writing up the results+

Where’s the data? What if I

move it?

Early in the research life cycle

With EZID: all your references, citations, links, etc. will be stable!

by Dave Rogers http://www.flickr.com/photos/dave-rogers/2815036285/

Working on a federated team

©All rights reserved by University of California, http://www.flickr.com/photos/universityofcalifornia/5405812887

Data-intensive research + Regional research center

+ Aging infrastructure

Where’s the data? We have to

move it!

With EZID: all your references, citations, links, etc. will be stable!

Making a career move

• Data-intensive research +

I know where my data is and I’m

taking it with me!

©All rights reserved by University of California, http://www.flickr.com/photos/universityofcalifornia/5406308654

• Researcher(s) on the move

With EZID: all your references, citations, links, etc. will be stable!

Meeting funder requirements

• Data-intensive research + • Grantor requirements for data management plan

How do we track the data?

What do we put here?

With EZID: track your data from capture to publication and beyond.

By David Mellis, http://www.flickr.com/photos/mellis/7675610/

Working with Libraries & Data Centers

• Libraries– Extending an historic role

• Data Centers & Publishers– Providing workflows and standards

EZID: Meeting library needs

• New kinds of scholarlyoutput

+ • Continued need to build collections

©All rights reserved by University of California, http://www.flickr.com/photos/universityofcalifornia/5098256828

With EZID: you can extend your historic activities & preserve your institution’s research investment.

How do we keep track of all this new stuff?

EZID: Meeting data center needs

©All rights reserved by University of California, http://www.flickr.com/photos/universityofcalifornia/5325618610

• New demands for storage

• Changing landscape+

With EZID: use simple tools, and easy workflows. Work with international standards.

They want what?

When?

http://n2t.net/ezid/

TOOLS & SERVICESMicro-services & MerrittDCXLWASData Paper modelEZID

PARTNERSHIPS• DataONE & DataCite• Data Management Plan Tool

Examples from CDL/UC3

SHARE

CREATE

GATHER

PUBLISH

PRESERVE

COLLECT

Judy Baxter, http://www.flickr.com/photos/judybaxter/9825836/

enable new science and knowledge creation through universal access to data about life on earth and the

environment that sustains it

1. Build on existing cyberinfrastructure

2. Create new cyberinfrastructure

3. Create new communities of practice

Working at the Network Level

DataONE’s new infrastructurehttps://www.dataone.org/

TOOLS & SERVICESMicro-services & MerrittDCXLWASData Paper modelEZID

PARTNERSHIPSDataONE & DataCite• Data Management Plan Tool

From CDL/UC3

DISCOVERSHARE

CREATE

GATHER

PUBLISH

PRESERVE

ACCESS

COLLECT

Judy Baxter, http://www.flickr.com/photos/judybaxter/9825836/

Data Management Plan Toolhttps://bitbucket.org/dmptool/main/wiki/Home

• Collaborative effort• Funders’ data mgmt/sharing polices• Journals’ (Nature, Science, and PLoS) data sharing requirements. • Researchers– Distributing research results leads to increased citations

(Piwowar et al., 2007)– A shared, common data set may help researchers

collaborate and accelerate discoveries (NY Times, 2010). – Better organization, leading to easier preservation– Cultivate quality and efficiency

Thanks to Jeffrey Loo, Chemical Informatics Librarian, UCB

University of California Curation Center, California Digital Library

Home screen: once the user has logged in presented with a view of their work and options

1.

2.

3.

University of California

Libraries

University of California Curation Center, California Digital Library

1.

2.

3.

University of California

Libraries

TOOLS & SERVICESMicro-services & MerrittDCXLWASData Paper modelEZID

PARTNERSHIPSDataONE & DataCiteData Management Plan Tool

From CDL/UC3

DISCOVERSHARE

CREATE

GATHER

PUBLISH

PRESERVE

ACCESS

COLLECT

Judy Baxter, http://www.flickr.com/photos/judybaxter/9825836/

Summary: Just how easy is it for you?

• Build your own (Curation micro-services)– specs

– code

• Open source tools– DCXL

– Data Management Plan tool

• Off the shelf options– Merritt

– EZID

– WASliquidnight: http://www.flickr.com/photos/liquidnight/3101493460/

& how easy is it for researchers?

• For organizing their data– DCXL , EZID

• To keep their data safe– Merritt, Micro-services

• To help them get grants – Data Management Plan tool

• To help get their worknoticed– EZID, Data Papers

• To help them find otherdata– EZID, Data Papers

liquidnight: http://www.flickr.com/photos/liquidnight/3101493460/

TOOLS!

• CURATECamp: unconference events connecting practitioners & technologists interested in digital curation and data management.

• Next f2f event: August 15 – 16, 2011Stanford University, Palo Alto, California

• http://www.regonline.com/Register/Checkin.aspx?EventID=953543

• http://groups.google.com/group/digital-curation

• http://curatecamp.org/

But wait, there’s more: Community!

courtesy of Oxnard Public Library, http://content.cdlib.org/ark:/13030/kt6c600758

and more information!

UC Curation Centerhttp://www.cdlib.org/uc3uc3@ucop.edu

EZIDhttp://n2t.net/ezid/

Micro-serviceshttp://www.cdlib.org/uc3/curationhttp://groups.google.com/group/digital-curation

UC3/CDLStephen Abrams David LoyPatricia Cruse Lisa ColvinScott Fisher Mark Reyes Erik Hetzner Tracy Seneca Greg JanéeJoan StarrJohn KunzeMarisa StrongMargaret Low Perry Willett

…and here’s how to find me.

Joan Starrjoan.starr@ucop.edu

@joan_starrhttp://www.slideshare.net/joanstarr

Image credits for Opening Slide

Optical Shop, Adam Reeder, http://www.flickr.com/photos/adamreeder/5379477315Streetcar, Adam Reeder, http://www.flickr.com/photos/adamreeder/5379459127Jazz Gumbo, Adam Reeder, http://www.flickr.com/photos/adamreeder/5380083448Streetcar, Adam Reeder, http://www.flickr.com/photos/adamreeder/5379459127Boat, Adam Reeder, http://www.flickr.com/photos/adamreeder/5379429155Garden, ncpttmedia, http://www.flickr.com/photos/ncpttmedia/4008605841Shutters, OZinOH, http://www.flickr.com/photos/75905404@N00/379444291

}