OCLC and Linked Data: An update on infrastructure testing ...
Data Designed for Discovery - OCLC...Data Designed for Discovery Roy Tennant Senior Program Officer,...
Transcript of Data Designed for Discovery - OCLC...Data Designed for Discovery Roy Tennant Senior Program Officer,...
![Page 1: Data Designed for Discovery - OCLC...Data Designed for Discovery Roy Tennant Senior Program Officer, OCLC Research • This is the Research view of linked data • We (OCLC) have experiments](https://reader033.fdocuments.us/reader033/viewer/2022042217/5ec1ba0f16421d42917d1501/html5/thumbnails/1.jpg)
Member Forum • 16 December 2016
Data Designed for Discovery
Roy TennantSenior Program Officer, OCLC Research
![Page 2: Data Designed for Discovery - OCLC...Data Designed for Discovery Roy Tennant Senior Program Officer, OCLC Research • This is the Research view of linked data • We (OCLC) have experiments](https://reader033.fdocuments.us/reader033/viewer/2022042217/5ec1ba0f16421d42917d1501/html5/thumbnails/2.jpg)
• This is the Research view of linked data• We (OCLC) have experiments and prototypes,
but no products or production services (yet)• We (OCLC Research) have been working with
linked data for as long as anyone in the library world
• Our (OCLC Research) playground is the entirety of WorldCat (380 million records) and a parallel computing cluster
• Stay tuned for more information on production services
A few introductory remarks
![Page 3: Data Designed for Discovery - OCLC...Data Designed for Discovery Roy Tennant Senior Program Officer, OCLC Research • This is the Research view of linked data • We (OCLC) have experiments](https://reader033.fdocuments.us/reader033/viewer/2022042217/5ec1ba0f16421d42917d1501/html5/thumbnails/3.jpg)
WHY LINKED DATA?
![Page 4: Data Designed for Discovery - OCLC...Data Designed for Discovery Roy Tennant Senior Program Officer, OCLC Research • This is the Research view of linked data • We (OCLC) have experiments](https://reader033.fdocuments.us/reader033/viewer/2022042217/5ec1ba0f16421d42917d1501/html5/thumbnails/4.jpg)
What we have to work with
![Page 5: Data Designed for Discovery - OCLC...Data Designed for Discovery Roy Tennant Senior Program Officer, OCLC Research • This is the Research view of linked data • We (OCLC) have experiments](https://reader033.fdocuments.us/reader033/viewer/2022042217/5ec1ba0f16421d42917d1501/html5/thumbnails/5.jpg)
• A collection of text strings…• Taken from the piece itself…• Sometimes “enhanced” with inferred
parentheticals (e.g., [1975] )…• Or additional statements not on the piece (e.g.,
subject headings)• Punctuation, which may or may not be present,
is used (inconsistently) for structure• Mostly uncontrolled and only loosely connected
to anything else• Designed for description rather than discovery
What we have to work with
![Page 6: Data Designed for Discovery - OCLC...Data Designed for Discovery Roy Tennant Senior Program Officer, OCLC Research • This is the Research view of linked data • We (OCLC) have experiments](https://reader033.fdocuments.us/reader033/viewer/2022042217/5ec1ba0f16421d42917d1501/html5/thumbnails/6.jpg)
THE PROBLEM
![Page 7: Data Designed for Discovery - OCLC...Data Designed for Discovery Roy Tennant Senior Program Officer, OCLC Research • This is the Research view of linked data • We (OCLC) have experiments](https://reader033.fdocuments.us/reader033/viewer/2022042217/5ec1ba0f16421d42917d1501/html5/thumbnails/7.jpg)
• Identification Problems (illustrated next):– The Title Problem– The Names Problem
• Quality Problems (illustrated next):– The Legacy Problem (strings are not controlled
terms; often, they cannot be turned into them)• Linkage Problems:
– The Web Problem (records aren’t enough, you need links)
– The Language Problem (showing the right translation for a given user)
Actually, A Number of Problems
![Page 8: Data Designed for Discovery - OCLC...Data Designed for Discovery Roy Tennant Senior Program Officer, OCLC Research • This is the Research view of linked data • We (OCLC) have experiments](https://reader033.fdocuments.us/reader033/viewer/2022042217/5ec1ba0f16421d42917d1501/html5/thumbnails/8.jpg)
![Page 9: Data Designed for Discovery - OCLC...Data Designed for Discovery Roy Tennant Senior Program Officer, OCLC Research • This is the Research view of linked data • We (OCLC) have experiments](https://reader033.fdocuments.us/reader033/viewer/2022042217/5ec1ba0f16421d42917d1501/html5/thumbnails/9.jpg)
![Page 10: Data Designed for Discovery - OCLC...Data Designed for Discovery Roy Tennant Senior Program Officer, OCLC Research • This is the Research view of linked data • We (OCLC) have experiments](https://reader033.fdocuments.us/reader033/viewer/2022042217/5ec1ba0f16421d42917d1501/html5/thumbnails/10.jpg)
Data Quality Problems
![Page 11: Data Designed for Discovery - OCLC...Data Designed for Discovery Roy Tennant Senior Program Officer, OCLC Research • This is the Research view of linked data • We (OCLC) have experiments](https://reader033.fdocuments.us/reader033/viewer/2022042217/5ec1ba0f16421d42917d1501/html5/thumbnails/11.jpg)
THE SOLUTION
![Page 12: Data Designed for Discovery - OCLC...Data Designed for Discovery Roy Tennant Senior Program Officer, OCLC Research • This is the Research view of linked data • We (OCLC) have experiments](https://reader033.fdocuments.us/reader033/viewer/2022042217/5ec1ba0f16421d42917d1501/html5/thumbnails/12.jpg)
First, define ALL
THE THINGS
![Page 13: Data Designed for Discovery - OCLC...Data Designed for Discovery Roy Tennant Senior Program Officer, OCLC Research • This is the Research view of linked data • We (OCLC) have experiments](https://reader033.fdocuments.us/reader033/viewer/2022042217/5ec1ba0f16421d42917d1501/html5/thumbnails/13.jpg)
Quick Definitions
entity/ˈɛntɪti/noun
a thing with distinct and independent existence.
relationship/rɪˈleɪʃ(ə)nʃɪp/noun
the way in which two or more people or things are connected
![Page 14: Data Designed for Discovery - OCLC...Data Designed for Discovery Roy Tennant Senior Program Officer, OCLC Research • This is the Research view of linked data • We (OCLC) have experiments](https://reader033.fdocuments.us/reader033/viewer/2022042217/5ec1ba0f16421d42917d1501/html5/thumbnails/14.jpg)
Albert Einstein Person
Relativity: The Special and General TheoryWork
PhysicsConcept
author
about
…establish relationships with other entities
![Page 15: Data Designed for Discovery - OCLC...Data Designed for Discovery Roy Tennant Senior Program Officer, OCLC Research • This is the Research view of linked data • We (OCLC) have experiments](https://reader033.fdocuments.us/reader033/viewer/2022042217/5ec1ba0f16421d42917d1501/html5/thumbnails/15.jpg)
https://www.wikidata.org/wiki/Q937 and http://viaf.org/viaf/75121530Wikidata and VIAF
http://experiment.worldcat.org/entity/work/data/369081611WorldCat Works
http://id.loc.gov/authorities/subjects/sh85101653.htmlLibrary of Congress Subject Headings
author
about
…with actionable links from authoritative data hubs
![Page 16: Data Designed for Discovery - OCLC...Data Designed for Discovery Roy Tennant Senior Program Officer, OCLC Research • This is the Research view of linked data • We (OCLC) have experiments](https://reader033.fdocuments.us/reader033/viewer/2022042217/5ec1ba0f16421d42917d1501/html5/thumbnails/16.jpg)
From Records to Entities: Works
![Page 17: Data Designed for Discovery - OCLC...Data Designed for Discovery Roy Tennant Senior Program Officer, OCLC Research • This is the Research view of linked data • We (OCLC) have experiments](https://reader033.fdocuments.us/reader033/viewer/2022042217/5ec1ba0f16421d42917d1501/html5/thumbnails/17.jpg)
![Page 18: Data Designed for Discovery - OCLC...Data Designed for Discovery Roy Tennant Senior Program Officer, OCLC Research • This is the Research view of linked data • We (OCLC) have experiments](https://reader033.fdocuments.us/reader033/viewer/2022042217/5ec1ba0f16421d42917d1501/html5/thumbnails/18.jpg)
![Page 19: Data Designed for Discovery - OCLC...Data Designed for Discovery Roy Tennant Senior Program Officer, OCLC Research • This is the Research view of linked data • We (OCLC) have experiments](https://reader033.fdocuments.us/reader033/viewer/2022042217/5ec1ba0f16421d42917d1501/html5/thumbnails/19.jpg)
![Page 20: Data Designed for Discovery - OCLC...Data Designed for Discovery Roy Tennant Senior Program Officer, OCLC Research • This is the Research view of linked data • We (OCLC) have experiments](https://reader033.fdocuments.us/reader033/viewer/2022042217/5ec1ba0f16421d42917d1501/html5/thumbnails/20.jpg)
![Page 21: Data Designed for Discovery - OCLC...Data Designed for Discovery Roy Tennant Senior Program Officer, OCLC Research • This is the Research view of linked data • We (OCLC) have experiments](https://reader033.fdocuments.us/reader033/viewer/2022042217/5ec1ba0f16421d42917d1501/html5/thumbnails/21.jpg)
OCLC Production Services
External OCLC Research Systems
Internal OCLC Research Resources
enhancedWorldCat
WORKS
Kindred Works
Classify
Identities
FictionFinder
Cookbook Finder
LCSH
FAST
VIAF
GMGPC
GSAFD
GTT
DDCLCTGM MeSH
Linked Data Entities
![Page 22: Data Designed for Discovery - OCLC...Data Designed for Discovery Roy Tennant Senior Program Officer, OCLC Research • This is the Research view of linked data • We (OCLC) have experiments](https://reader033.fdocuments.us/reader033/viewer/2022042217/5ec1ba0f16421d42917d1501/html5/thumbnails/22.jpg)
OCLC’s linked data resources
WorldCat Catalog:15 billion triples
WorldCat Works: 5 billion RDF triples
FAST:23 million
triples
VIAF: 2 billion triples
ISNI: 10-50 million triples
![Page 23: Data Designed for Discovery - OCLC...Data Designed for Discovery Roy Tennant Senior Program Officer, OCLC Research • This is the Research view of linked data • We (OCLC) have experiments](https://reader033.fdocuments.us/reader033/viewer/2022042217/5ec1ba0f16421d42917d1501/html5/thumbnails/23.jpg)
VIAF aggregates identifiers
![Page 24: Data Designed for Discovery - OCLC...Data Designed for Discovery Roy Tennant Senior Program Officer, OCLC Research • This is the Research view of linked data • We (OCLC) have experiments](https://reader033.fdocuments.us/reader033/viewer/2022042217/5ec1ba0f16421d42917d1501/html5/thumbnails/24.jpg)
Wikidata disseminates identifiers
![Page 25: Data Designed for Discovery - OCLC...Data Designed for Discovery Roy Tennant Senior Program Officer, OCLC Research • This is the Research view of linked data • We (OCLC) have experiments](https://reader033.fdocuments.us/reader033/viewer/2022042217/5ec1ba0f16421d42917d1501/html5/thumbnails/25.jpg)
OCLC’S 2015 INTERNATIONAL LINKED DATA SURVEYSOURCE: KAREN SMITH-YOSHIMURA
![Page 26: Data Designed for Discovery - OCLC...Data Designed for Discovery Roy Tennant Senior Program Officer, OCLC Research • This is the Research view of linked data • We (OCLC) have experiments](https://reader033.fdocuments.us/reader033/viewer/2022042217/5ec1ba0f16421d42917d1501/html5/thumbnails/26.jpg)
Academic library
National library
Network
Government
Scholarly
Public Library
Museum
Other
31%
20%14%10%
8%7%
4% 6%
2015 responding institutions by type
71 institutions total
![Page 27: Data Designed for Discovery - OCLC...Data Designed for Discovery Roy Tennant Senior Program Officer, OCLC Research • This is the Research view of linked data • We (OCLC) have experiments](https://reader033.fdocuments.us/reader033/viewer/2022042217/5ec1ba0f16421d42917d1501/html5/thumbnails/27.jpg)
What is published as linked data
0 10 20 30 40 50 60
Authority filesBibliographic data
Data about musuem objectsDatasets
Descriptive metadataDigital collections
Encoded archival descriptionsGeographic data
Ontologies/vocabulariesOther
![Page 28: Data Designed for Discovery - OCLC...Data Designed for Discovery Roy Tennant Senior Program Officer, OCLC Research • This is the Research view of linked data • We (OCLC) have experiments](https://reader033.fdocuments.us/reader033/viewer/2022042217/5ec1ba0f16421d42917d1501/html5/thumbnails/28.jpg)
2015 linked data sources most consumed 2015VIAF (Virtual International Authority File) 41DBpedia 36GeoNames 35id.loc.gov 35Resources we convert to linked data ourselves 17Getty's AAT 16FAST (Faceted Application of Subject Terminology) 15WorldCat.org 15data.bnf.fr 12Deutsche National Bib Linked Data Service 12
![Page 29: Data Designed for Discovery - OCLC...Data Designed for Discovery Roy Tennant Senior Program Officer, OCLC Research • This is the Research view of linked data • We (OCLC) have experiments](https://reader033.fdocuments.us/reader033/viewer/2022042217/5ec1ba0f16421d42917d1501/html5/thumbnails/29.jpg)
SOLVING PROBLEMS & MOVING TOWARD A LINKED DATA FUTURE
![Page 30: Data Designed for Discovery - OCLC...Data Designed for Discovery Roy Tennant Senior Program Officer, OCLC Research • This is the Research view of linked data • We (OCLC) have experiments](https://reader033.fdocuments.us/reader033/viewer/2022042217/5ec1ba0f16421d42917d1501/html5/thumbnails/30.jpg)
Improving the Discovery Experience
![Page 31: Data Designed for Discovery - OCLC...Data Designed for Discovery Roy Tennant Senior Program Officer, OCLC Research • This is the Research view of linked data • We (OCLC) have experiments](https://reader033.fdocuments.us/reader033/viewer/2022042217/5ec1ba0f16421d42917d1501/html5/thumbnails/31.jpg)
![Page 32: Data Designed for Discovery - OCLC...Data Designed for Discovery Roy Tennant Senior Program Officer, OCLC Research • This is the Research view of linked data • We (OCLC) have experiments](https://reader033.fdocuments.us/reader033/viewer/2022042217/5ec1ba0f16421d42917d1501/html5/thumbnails/32.jpg)
![Page 33: Data Designed for Discovery - OCLC...Data Designed for Discovery Roy Tennant Senior Program Officer, OCLC Research • This is the Research view of linked data • We (OCLC) have experiments](https://reader033.fdocuments.us/reader033/viewer/2022042217/5ec1ba0f16421d42917d1501/html5/thumbnails/33.jpg)
Exploring Ways to Use Linked Data
![Page 34: Data Designed for Discovery - OCLC...Data Designed for Discovery Roy Tennant Senior Program Officer, OCLC Research • This is the Research view of linked data • We (OCLC) have experiments](https://reader033.fdocuments.us/reader033/viewer/2022042217/5ec1ba0f16421d42917d1501/html5/thumbnails/34.jpg)
![Page 35: Data Designed for Discovery - OCLC...Data Designed for Discovery Roy Tennant Senior Program Officer, OCLC Research • This is the Research view of linked data • We (OCLC) have experiments](https://reader033.fdocuments.us/reader033/viewer/2022042217/5ec1ba0f16421d42917d1501/html5/thumbnails/35.jpg)
![Page 36: Data Designed for Discovery - OCLC...Data Designed for Discovery Roy Tennant Senior Program Officer, OCLC Research • This is the Research view of linked data • We (OCLC) have experiments](https://reader033.fdocuments.us/reader033/viewer/2022042217/5ec1ba0f16421d42917d1501/html5/thumbnails/36.jpg)
Title: Journey to the WestLanguage: EnglishTranslator: Anthony C. YuDate: 1977IsTranslationOf:
Title: Journey to the WestLanguage: EnglishTranslator: W. J. F. JennerDate: 1982-1984IsTranslationOf:
Title: 西遊記Language: ChineseAuthor: 吳承恩Created: 1592HasTranslation:
Title: Tây du ký bình khảoLanguage: VietnameseTranslator: Phan QuânDate: 1980IsTranslationOf:
Title: 西遊記Language: JapaneseTranslator: 中野美代子Date: 1986IsTranslationOf:
Title: PilgerfahrtLanguage: GermanTranslator: Georgette Boner Date: 1983IsTranslationOf:
Offering the right translation
![Page 37: Data Designed for Discovery - OCLC...Data Designed for Discovery Roy Tennant Senior Program Officer, OCLC Research • This is the Research view of linked data • We (OCLC) have experiments](https://reader033.fdocuments.us/reader033/viewer/2022042217/5ec1ba0f16421d42917d1501/html5/thumbnails/37.jpg)
Title: Journey to the WestLanguage: EnglishTranslator: Anthony C. YuDate: 1977IsTranslationOf:
Title: Journey to the WestLanguage: EnglishTranslator: W. J. F. JennerDate: 1982-1984IsTranslationOf:
Title: 西遊記Language: ChineseAuthor: 吳承恩Created: 1592HasTranslation:
Title: Tây du ký bình khảoLanguage: VietnameseTranslator: Phan QuânDate: 1980IsTranslationOf:
Title: 西遊記Language: JapaneseTranslator: 中野美代子Date: 1986IsTranslationOf:
Title: PilgerfahrtLanguage: GermanTranslator: Georgette Boner Date: 1983IsTranslationOf:
Offering the right translation
![Page 38: Data Designed for Discovery - OCLC...Data Designed for Discovery Roy Tennant Senior Program Officer, OCLC Research • This is the Research view of linked data • We (OCLC) have experiments](https://reader033.fdocuments.us/reader033/viewer/2022042217/5ec1ba0f16421d42917d1501/html5/thumbnails/38.jpg)
Bringing Authority Control to the Web
![Page 39: Data Designed for Discovery - OCLC...Data Designed for Discovery Roy Tennant Senior Program Officer, OCLC Research • This is the Research view of linked data • We (OCLC) have experiments](https://reader033.fdocuments.us/reader033/viewer/2022042217/5ec1ba0f16421d42917d1501/html5/thumbnails/39.jpg)
• Person Lookup Service – An experimental service for looking up OCLC Person Entities
• Scenario:– A library wants to disambiguate a name – It sends the name text string to our API– We check all of our aggregated authority files and
send back the best match(es)– Each response comes with one or more URIs (e.g., to
LCNAF, Wikidata, ISNI, etc.)– The library inserts this data into their record, turning a
text string into an actionable link on the web
Prototyping New Services
![Page 40: Data Designed for Discovery - OCLC...Data Designed for Discovery Roy Tennant Senior Program Officer, OCLC Research • This is the Research view of linked data • We (OCLC) have experiments](https://reader033.fdocuments.us/reader033/viewer/2022042217/5ec1ba0f16421d42917d1501/html5/thumbnails/40.jpg)
Replicate existing library functions more cheaply and
efficiently
Improve data integration
A better user experience
Greater Web visibility
Develop better models of resources not well served by
current standards
Improve internal data management
In Summary: Why Linked Data?
![Page 41: Data Designed for Discovery - OCLC...Data Designed for Discovery Roy Tennant Senior Program Officer, OCLC Research • This is the Research view of linked data • We (OCLC) have experiments](https://reader033.fdocuments.us/reader033/viewer/2022042217/5ec1ba0f16421d42917d1501/html5/thumbnails/41.jpg)
EASING THE TRANSITION
![Page 42: Data Designed for Discovery - OCLC...Data Designed for Discovery Roy Tennant Senior Program Officer, OCLC Research • This is the Research view of linked data • We (OCLC) have experiments](https://reader033.fdocuments.us/reader033/viewer/2022042217/5ec1ba0f16421d42917d1501/html5/thumbnails/42.jpg)
• Working with the Library of Congress and others to finalize the BIBFRAME standard
• Beginning to explore what working with it at scale will mean
Collaborating on BIBFRAME
![Page 43: Data Designed for Discovery - OCLC...Data Designed for Discovery Roy Tennant Senior Program Officer, OCLC Research • This is the Research view of linked data • We (OCLC) have experiments](https://reader033.fdocuments.us/reader033/viewer/2022042217/5ec1ba0f16421d42917d1501/html5/thumbnails/43.jpg)
• Modeling bibliographic data using Schema.org• Collaborating on expanding the Schema.org with
additional bibliographic elements at bib.schema.org• Syndicating WorldCat data to search engines using
Schema.org markup
Working With the Web
![Page 44: Data Designed for Discovery - OCLC...Data Designed for Discovery Roy Tennant Senior Program Officer, OCLC Research • This is the Research view of linked data • We (OCLC) have experiments](https://reader033.fdocuments.us/reader033/viewer/2022042217/5ec1ba0f16421d42917d1501/html5/thumbnails/44.jpg)
Learning About Changing Workflows
Photo by https://www.flickr.com/photos/sanjoselibrary/ - CC BY-SA 2.0
![Page 45: Data Designed for Discovery - OCLC...Data Designed for Discovery Roy Tennant Senior Program Officer, OCLC Research • This is the Research view of linked data • We (OCLC) have experiments](https://reader033.fdocuments.us/reader033/viewer/2022042217/5ec1ba0f16421d42917d1501/html5/thumbnails/45.jpg)
![Page 46: Data Designed for Discovery - OCLC...Data Designed for Discovery Roy Tennant Senior Program Officer, OCLC Research • This is the Research view of linked data • We (OCLC) have experiments](https://reader033.fdocuments.us/reader033/viewer/2022042217/5ec1ba0f16421d42917d1501/html5/thumbnails/46.jpg)
• Use uniform titles • Use added entries with role codes (7xx and $4)• Use 041 for translations, including intermediate translations• Use indicators to refine the meaning
• Use the most specific fields appropriate for a descriptive task
• Minimize the use of 500 fields• Obey field semantics• Avoid redundancy
If you must use free text:• Use established conventions• Use standardized terms
Least machine-processable
Most machine-processable
Algorithmically recoverable
Making MARC “Linked Data Ready”
![Page 47: Data Designed for Discovery - OCLC...Data Designed for Discovery Roy Tennant Senior Program Officer, OCLC Research • This is the Research view of linked data • We (OCLC) have experiments](https://reader033.fdocuments.us/reader033/viewer/2022042217/5ec1ba0f16421d42917d1501/html5/thumbnails/47.jpg)
The Charge How should URIs
be added to MARC records to ease the transition to Linked
Data?
Participants • British Library, German National Library, Library of Congress,
National Library of Medicine, OCLC.• University libraries at Cornell, Columbia, George Washington,
Harvard, Ohio State, Stanford, University of Washington
Creating Standards for URIs
![Page 48: Data Designed for Discovery - OCLC...Data Designed for Discovery Roy Tennant Senior Program Officer, OCLC Research • This is the Research view of linked data • We (OCLC) have experiments](https://reader033.fdocuments.us/reader033/viewer/2022042217/5ec1ba0f16421d42917d1501/html5/thumbnails/48.jpg)
• We are in a major transition that will take YEARS to navigate
• We don’t know yet exactly what the future holds…
• ...but we know that it will be more linked and machine readable (actionable) than ever before
• And that’s a Good Thing
Summary Remarks
![Page 49: Data Designed for Discovery - OCLC...Data Designed for Discovery Roy Tennant Senior Program Officer, OCLC Research • This is the Research view of linked data • We (OCLC) have experiments](https://reader033.fdocuments.us/reader033/viewer/2022042217/5ec1ba0f16421d42917d1501/html5/thumbnails/49.jpg)
For More Information
![Page 50: Data Designed for Discovery - OCLC...Data Designed for Discovery Roy Tennant Senior Program Officer, OCLC Research • This is the Research view of linked data • We (OCLC) have experiments](https://reader033.fdocuments.us/reader033/viewer/2022042217/5ec1ba0f16421d42917d1501/html5/thumbnails/50.jpg)
SMTogether we make breakthroughs possible.
Thank you!Roy Tennant@[email protected]
OCLC Member Forum • 3 Nov 2016
©2016 OCLC. This work is licensed under a Creative Commons Attribution 4.0 International License. Suggested attribution: “This work uses content from “Data Designed for Discovery” © OCLC, used under a Creative Commons Attribution 4.0 International License: http://creativecommons.org/licenses/by/4.0/.”