1 11-20-14/Greenberg Metadata Quality and Capital Disseminators and Service Providers November 20,...

37
1 11-20-14/Greenberg Metadata Quality and Capital Disseminators and Service Providers November 20, 2014 Jane Greenberg Professor, College of Computing & Informatics Director, Metadata Research Center

Transcript of 1 11-20-14/Greenberg Metadata Quality and Capital Disseminators and Service Providers November 20,...

Page 1: 1 11-20-14/Greenberg Metadata Quality and Capital Disseminators and Service Providers November 20, 2014 Jane Greenberg Professor, College of Computing.

1 11-20-14/Greenberg

Metadata Quality and Capital

Disseminators and Service ProvidersNovember 20, 2014

Jane GreenbergProfessor, College of Computing & InformaticsDirector, Metadata Research Center

Page 2: 1 11-20-14/Greenberg Metadata Quality and Capital Disseminators and Service Providers November 20, 2014 Jane Greenberg Professor, College of Computing.

2 11-20-14/Greenberg

Your data is only as good

as your metadataMetadata is a first class object

Page 3: 1 11-20-14/Greenberg Metadata Quality and Capital Disseminators and Service Providers November 20, 2014 Jane Greenberg Professor, College of Computing.

Toothbrush

Page 4: 1 11-20-14/Greenberg Metadata Quality and Capital Disseminators and Service Providers November 20, 2014 Jane Greenberg Professor, College of Computing.

4 11-20-14/Greenberg

The topic…

Good enough is not bad (DRYAD)(DRYAD)

ROI – return on investment (CAPITAL)(CAPITAL)

RDA – Research Data Alliance (COMMUNITY)…. time permitting(COMMUNITY)…. time permitting

Page 5: 1 11-20-14/Greenberg Metadata Quality and Capital Disseminators and Service Providers November 20, 2014 Jane Greenberg Professor, College of Computing.
Page 6: 1 11-20-14/Greenberg Metadata Quality and Capital Disseminators and Service Providers November 20, 2014 Jane Greenberg Professor, College of Computing.

6 11-20-14/Greenberg

Page 7: 1 11-20-14/Greenberg Metadata Quality and Capital Disseminators and Service Providers November 20, 2014 Jane Greenberg Professor, College of Computing.
Page 8: 1 11-20-14/Greenberg Metadata Quality and Capital Disseminators and Service Providers November 20, 2014 Jane Greenberg Professor, College of Computing.

8 11-20-14/Greenberg

Pre-populated metadatafield

Page 9: 1 11-20-14/Greenberg Metadata Quality and Capital Disseminators and Service Providers November 20, 2014 Jane Greenberg Professor, College of Computing.

9 11-20-14/Greenberg

Page 10: 1 11-20-14/Greenberg Metadata Quality and Capital Disseminators and Service Providers November 20, 2014 Jane Greenberg Professor, College of Computing.

10 11-20-14/Greenberg

Data downloads reuse citation

Observations, motivating study of metadata capital1.Metadata generation costs money

2.Metadata reuse is a BIG a BIG part part of Dryad’s workflow3.Metadata reuse via OAI4.Metadata reuse via data sharing, reuse, and repurposing

Download 10678 times

Page 11: 1 11-20-14/Greenberg Metadata Quality and Capital Disseminators and Service Providers November 20, 2014 Jane Greenberg Professor, College of Computing.

Journal Re.Wrkfl

Blackout

AmNtrl N NMBE N NBioRisk Y NBMJ Open

Y N

…. Y

Type Total 30 days

Data packages 6781 198

Data files 20832 957

Journals 361 72

Authors 24166 3312

Downloads 635348 37611

• Journals (80+…PLOS): http://datadryad.org/pages/integratedJournals

• X >10GB = $15,$10+

Page 12: 1 11-20-14/Greenberg Metadata Quality and Capital Disseminators and Service Providers November 20, 2014 Jane Greenberg Professor, College of Computing.

12 11-20-14/Greenberg

TechnologyDSpace DOIs via CDL/DataCiteCC0 (<m> + data)Integration with specialized repositories and databasesFederated searching with TreeBASE and KNB LTERTreeBASE submission (OAI-PMH)GenBank (currently in development)

Governance““non-profit status, 12 non-profit status, 12 member Board of Directors”member Board of Directors”

Sets policy, goals•science, journals, societies, OCLC, MS

2006 Dryad development – NESCent +<MRC>•Stakeholders: journals, publishers and scientific societies, and researchers.

2009-2012: Interim Board

$ PAYMENT-Sept. 1,2014

Page 13: 1 11-20-14/Greenberg Metadata Quality and Capital Disseminators and Service Providers November 20, 2014 Jane Greenberg Professor, College of Computing.

13 11-20-14/Greenberg

Page 14: 1 11-20-14/Greenberg Metadata Quality and Capital Disseminators and Service Providers November 20, 2014 Jane Greenberg Professor, College of Computing.

14 11-20-14/Greenberg

Singapore Framework

Dryad DCAP, ver. 3.0bibo (The Bibliographic Ontology)dcterms (Dublin Core terms)dryad (Dryad) DwC (Darwin Core)

Vision1.Simple: automatic metadata gen; heterogeneous datasets *Data-package centric2.Interoperable: harvesting, cross-system searching 3.Semantic Web compatible: sustainable; supporting machine processing

Greenberg, et al, 2009, Metadata Best Practice for a Scientific Data Repository, JLM, DOI:10.1080/19386380903405090.

Page 15: 1 11-20-14/Greenberg Metadata Quality and Capital Disseminators and Service Providers November 20, 2014 Jane Greenberg Professor, College of Computing.

15 11-20-14/Greenberg

Metadata research & developmentMetadata research & development1.Curation workflow - cognitive walkthroughs2.Dryad metadata scheme development - crosswalk analyses (Dube, et al, 2007; Carrier, et al, 2007; White et al., 2008, Greenberg, et al, 2010; Greenberg 2009; 2010)3.Metadata reuse - content analysis (Greenberg, IDCC Research Summit, 2010) 4.Instantiation - multi-method study (comprehensions assessment) (Greenberg, RDAP, 2010, UNAM 2012)5.Name-authority control - exploratory study (Haven, 2009, INLS 720)6.KO/metadata community practices - Concurrent triangulation mixed methods (survey + simulation experiment) (White, 2010, ASIST, 2010 JLM)7.Metadata functions - quantitative categorical analysis (Willis, Greenberg, and White, 2010, CODATA, 2012, JASIST) 8.Vocabulary needs (HIVE) (HIVE) – mapping study (Greenberg, 2009, CCQ; Scherle, 2010, Code4Lib)9.Metadata theory – deductive analysis (Greenberg, 2009)

Page 16: 1 11-20-14/Greenberg Metadata Quality and Capital Disseminators and Service Providers November 20, 2014 Jane Greenberg Professor, College of Computing.

Interoperability slope

Semantic ontologies

Researcher names

Agency/institution

Page 17: 1 11-20-14/Greenberg Metadata Quality and Capital Disseminators and Service Providers November 20, 2014 Jane Greenberg Professor, College of Computing.

17 11-20-14/Greenberg

Page 18: 1 11-20-14/Greenberg Metadata Quality and Capital Disseminators and Service Providers November 20, 2014 Jane Greenberg Professor, College of Computing.
Page 19: 1 11-20-14/Greenberg Metadata Quality and Capital Disseminators and Service Providers November 20, 2014 Jane Greenberg Professor, College of Computing.

Package metadata harvested from email

Subj. 177 (gr. 97%, rd. 2%, bl. 1%)

Contr. 101 (gr. 99%, bl. 1%)

Page 20: 1 11-20-14/Greenberg Metadata Quality and Capital Disseminators and Service Providers November 20, 2014 Jane Greenberg Professor, College of Computing.

20 11-20-14/Greenberg

The leap - capital to metadata capital

An economic concept (Weber, 1905; Smith’s, 1776) • Business and operations (net gains or losses)• Finances, goods and services, and public needs• Intellectual capital, social capital• a tangible result, value increase

Metadata as an asset, a product • Reuse of good quality metadata increase

value of initial investment• Poor quality may reduce metadata capital ?

• Metadata reuse prevalence • Cooperative cataloging , CIP, ISBD, MARC, FRBR,

LCC, VIAF, OAI-PMH, CrossRef, PubMed, Zotero, BibTex, DataCite. Linked data/Semantic Web, PIDs, etc.

Page 21: 1 11-20-14/Greenberg Metadata Quality and Capital Disseminators and Service Providers November 20, 2014 Jane Greenberg Professor, College of Computing.

Modified Capital-sigma notation

Reuse

nR + ∑ ai = R + a1 + a2 +a3 + …an

i=1R = value of the metadata recordi= number of usagesa = incremental increase in valuen = maximum number of reuse

Page 22: 1 11-20-14/Greenberg Metadata Quality and Capital Disseminators and Service Providers November 20, 2014 Jane Greenberg Professor, College of Computing.

22 11-20-14/Greenberg

Author/Submitter | Curator

100 metadata instantiations•8 of 12 metadata properties had reuse @ 50% or greater•5 of 8 confirmed reuse at• 80% or higher. •Basic bib. vs. complex

Page 23: 1 11-20-14/Greenberg Metadata Quality and Capital Disseminators and Service Providers November 20, 2014 Jane Greenberg Professor, College of Computing.

Author

Subject

Dcterms.spatial

DwC.ScientificName

Page 24: 1 11-20-14/Greenberg Metadata Quality and Capital Disseminators and Service Providers November 20, 2014 Jane Greenberg Professor, College of Computing.

Modified Capital-sigma notation for linked data linked data

Reuse of linked data concept/URI

P = Determined by the number of terms in an ontology, labor hours to generate, integrate, etc,

Page 25: 1 11-20-14/Greenberg Metadata Quality and Capital Disseminators and Service Providers November 20, 2014 Jane Greenberg Professor, College of Computing.

25

Helping Interdisciplinary Vocabulary Engineering (HIVE)HIVE)

C V cost, interoperability, and usability constraintsC V cost, interoperability, and usability constraints Linked Open Vocabulary initiative, to support inter/transdisciplinary…. SKOS (a little dumb) AMG + machine learning approach for integrating discipline terminologies

Page 26: 1 11-20-14/Greenberg Metadata Quality and Capital Disseminators and Service Providers November 20, 2014 Jane Greenberg Professor, College of Computing.
Page 27: 1 11-20-14/Greenberg Metadata Quality and Capital Disseminators and Service Providers November 20, 2014 Jane Greenberg Professor, College of Computing.

27 11-20-14/Greenberg

~~~~Amy~~~~Amy

Meet Amy Zanne. She is a botanist.

Like every good scientist, she publishes, and she deposits data in Dryad.

Amy’s dataAmy’s data

Page 28: 1 11-20-14/Greenberg Metadata Quality and Capital Disseminators and Service Providers November 20, 2014 Jane Greenberg Professor, College of Computing.

28 11-20-14/Greenberg

Page 29: 1 11-20-14/Greenberg Metadata Quality and Capital Disseminators and Service Providers November 20, 2014 Jane Greenberg Professor, College of Computing.

29 11-20-14/Greenberg

Successive growth rates

N∑ ic = Θ (nc +1) i=1

Cycles…

What about successive growth rate tied to a concept? A concept can be

• in ~ vernacular to canonical• fall by the wayside, less popular• out (deprecated)

Page 30: 1 11-20-14/Greenberg Metadata Quality and Capital Disseminators and Service Providers November 20, 2014 Jane Greenberg Professor, College of Computing.

30 11-20-14/Greenberg

Conclusion…other Valuation Approaches

Market cap of Facebook per user: $40 – $300 Revenues per record per user: $4 – $7 per year

• Facebook• Experian

Market prices of personal data:

• $0.50 for street address• $2.00 for date of birth• $8 for social security number• $3 for driver’s license number• $35 for military record

SOURCE: OECD. Exploring the Economics of Personal Data: A Survey of Methodologies for Measuring Monetary Value. OECD Digital Economy Papers. Office for Economic Cooperation and Development Publishing, 2013.

Page 31: 1 11-20-14/Greenberg Metadata Quality and Capital Disseminators and Service Providers November 20, 2014 Jane Greenberg Professor, College of Computing.

Concluding remarks

Interest….traction Limitations: bad data,

cost/value We should care about

cost Metadata capital can

contextualize Generic formula for

further research

Page 32: 1 11-20-14/Greenberg Metadata Quality and Capital Disseminators and Service Providers November 20, 2014 Jane Greenberg Professor, College of Computing.

32 11-20-14/Greenberg

Metadata Standards Directory Working Group….

Jane Greenberg, Alex Ball, Keith Jeffery, Rebecca Koskela

Page 33: 1 11-20-14/Greenberg Metadata Quality and Capital Disseminators and Service Providers November 20, 2014 Jane Greenberg Professor, College of Computing.

33 11-20-14/Greenberg

“…develop a collaborative, open directory of metadata standards applicable to scientific data”Stakeholders: Researchers, data managers, data scientists, tool developers, repositories, agencies, societies (RDA’s growing community)

Goals and workplan - DCC Disciplinary Directory: http://www.dcc.ac.uk/resources/metadata-standards

Page 34: 1 11-20-14/Greenberg Metadata Quality and Capital Disseminators and Service Providers November 20, 2014 Jane Greenberg Professor, College of Computing.

34 11-20-14/Greenberg

Acknowledgments Dryad Consortium Board, journal partners, and data authors NESCent: Laura Wendell (Executive Director), Hilmar Lapp,

Heather Piwowar, Peggy Schaeffer, Ryan Scherle, Todd Vision (PI)

**Drexel/UNC <Metadata Research Center>: Jose R. Pérez-Agüera, Sarah Carrier, Elena Feinstein, Lina Huang, Robert Losee, Hollie White, Craig Willis, Jane Smith, Shea Swuager, Liz Turner, Christine Mayo, Adrian Ogletree, Erin Clary

U British Columbia: Michael Whitlock NCSU Digital Libraries: Kristin Antelman HIVE: Library of Congress, USGS, and The Getty Research

Institute; and workshop hosts Yale/TreeBASE: Youjun Guo, Bill Piel DataONE: Rebecca Koskela, Bill Michener, Dave Veiglais, and

many others British Library: Lee-Ann Coleman, Adam Farquhar, Brian Hole Oxford University: David Shotton

Page 35: 1 11-20-14/Greenberg Metadata Quality and Capital Disseminators and Service Providers November 20, 2014 Jane Greenberg Professor, College of Computing.

35 11-20-14/Greenberg

http://datadryad.org http://blog.datadryad.org http://datadryad.org/wiki

http://code.google.com/p/[email protected]

Facebook: Dryad Twitter: @datadryad

http://ils.unc.edu/mrc/hive/ http://code.google.com/p/hive-mrc/

Metsdata Reserch Center: http://cci.drexel.edu/mrc

http://datadryad.org http://blog.datadryad.org http://datadryad.org/wiki

http://code.google.com/p/[email protected]

Facebook: Dryad Twitter: @datadryad

http://ils.unc.edu/mrc/hive/ http://code.google.com/p/hive-mrc/

Metsdata Reserch Center: http://cci.drexel.edu/mrc

Page 36: 1 11-20-14/Greenberg Metadata Quality and Capital Disseminators and Service Providers November 20, 2014 Jane Greenberg Professor, College of Computing.

36 11-20-14/Greenberg

Sustainability: Plan Comparison

Payment Plan Member Non-member Minimum purchase

1. Voucher Plan USD$65 per data package

USD$70 per data package 25 vouchers

2. Deferred Payment Plan

USD$70 per data package

USD$75 per data package 1 yr contract

3. Subscription Plan

Annual fee based on USD$25 per published research article

Annual fee based on USD$30 per published research article

2 yr contract

For individuals:Pay on acceptance NA

USD$80 per data package, payable by the submitter

1 data package

Page 37: 1 11-20-14/Greenberg Metadata Quality and Capital Disseminators and Service Providers November 20, 2014 Jane Greenberg Professor, College of Computing.

37 11-20-14/Greenberg

More on grown and sustainability Membership: http://datadryad.org/pages/

membershipOverview Pricing and sponsorship of

deposits: http://datadryad.org/pages/pricing

Journal integration:  http://datadryad.org/pages/

journalIntegration