Post on 26-Jun-2020
#oclcforum
OCLC Member Forum – 2016
Designing Data for Discovery: OCLC’s Linked Data Strategy
Rob FaviniMember Relations Liaison, OCLC Membership and Research Divisionfavinir@oclc.org @InfoFav
#oclcforum
• A little linked data background• OCLC’s interest in linked data• Examples of the work that we’ve been doing• A couple experimental projects• Discussion
Today’s session
“The term Linked Data refers to a set of best practices for publishing and connecting structured data on the Web.”
What is linked data?
Heath, T., Hepp, M., and Bizer, C. (eds.). SpecialIssue on Linked Data, International Journal on Semantic Web and Information Systems(IJSWIS). http://linkeddata.org/docs/ijswis-special-issue
#oclcforum
• Data in context with a web address allowing for annotation and referencing
• Data is linked, allowing information to be combined across silos and enhanced by combination with third party data sources
• Data is accessible at a granular level over the web enabling applications to run from live data
Linked data benefits
#oclcforum
• A structured set of procedures and standards…there’s no one agreed on manual
• A library-only thing• A replacement for MARC• Fully developed• Turned on with a switch• Going to solve EVERYTHING
Linked data is not…
#oclcforum
• Evolving in the library world• Used to link related information across the Web• A tool/methodology that will someday help to make library
data more accessible by the greater Web• Happening in many places today• Agenda-free
Linked data is…
http://blog.icpl.org/2015/10/06/farewell-catalog-card/
#oclcforum
• A collection of statements…• Taken from the piece itself…• Sometimes “enhanced” with inferred parentheticals (e.g., [1975]
)…• Or additional statements not on the piece (e.g., subject headings)• Where punctuation, which may or may not be present, is used
(inconsistently) for structure• Mostly uncontrolled and only loosely connected to anything else
The classic bib record
THE PROBLEM
#oclcforum
• Identification Problems:– “The Hamlet Problem” (titles aren’t enough)– “The John Rock” (names aren’t enough)
• Linkage Problems:– “The Web Problem” (records aren’t enough, you need links)– “The Language Problem” (surfacing the right translation for a given user)
• Quality Problems:– “The Legacy Problem” (strings are not controlled terms; often, they cannot
be turned into them)
Actually, a number of problems
#oclcforum
• Identification Problems:– “The Hamlet Problem” (titles aren’t enough)– “The John Rock” (names aren’t enough)
• Linkage Problems:– “The Web Problem” (records aren’t enough, you need links)– “The Language Problem” (surfacing the right translation for a given user)
• Quality Problems:– “The Legacy Problem” (strings are not controlled terms; often, they cannot
be turned into them)
Actually, a number of problems
First, define ALL
THE THINGS
“Now! … That should clear upA few things around here!”
entity/ˈɛntɪti/noun
a thing with distinct and independent existence.
relationship/rɪˈleɪʃ(ə)nʃɪp/noun
the way in which two or more people or things are connected
RecordTitle: "War and Peace"Author: "Leo Tolstoy 1828-1910"ISBN: 0307266931
Type: WorkName: "War and Peace"Author: http://worldcat.org/entity/person/id/1234
Entity (http://worldcat.org/entity/work/id/115206288)
Type: PersonName: "Leo Tolstoy "Born: 1828Died: 1910Birthplace: http://worldcat.org/entity/place/id/8976
Entity (http://worldcat.org/entity/person/id/1234)
Type: PlaceName: "Yasnaya Polyana"SameAs: http://geonames.org/468686
Entity (http://worldcat.org/entity/place/id/8976)
⟶
person place
object concept
organization work
subjectitemavailability
author
Relationships between entities are established
“Shredding”
LINKED DATA AND OCLC
#oclcforum
OCLC-wide effort
OCLC RESEARCH PRODUCTMANAGEMENT
WORLDCAT & DATA
INFRASTRUCTURE
#oclcforum
• The end goal of metadata is discovery-delivery-fulfillment; getting stuff in users’ hands!
• Making things more Web-search ready• Recognize that search begins outside of the library• Life after MARC• Leverage rich WorldCat data
Why we’re doing it
#oclcforum
• To convert text into identifiers • To disambiguate similar names or labels• To create global identifiers for local names• To cooperatively manage new forms of metadata• Have OCLC act as hub for this metadata• Have OCLC commit to production-level services instead of
experiments
Needs of early adopters
#oclcforum
Improving the discovery experience
Title: Journey to the WestLanguage: EnglishTranslator: Anthony C. YuDate: 1977IsTranslationOf:
Title: Journey to the WestLanguage: EnglishTranslator: W. J. F. JennerDate: 1982-1984IsTranslationOf:
Title: 西遊記Language: ChineseAuthor: 吳承恩Created: 1592HasTranslation:
Title: Tây du ký bình khảoLanguage: VietnameseTranslator: Phan QuânDate: 1980IsTranslationOf:
Title: 西遊記Language: JapaneseTranslator: 中野美代子Date: 1986IsTranslationOf:
Title: PilgerfahrtLanguage: GermanTranslator: Georgette Boner Date: 1983IsTranslationOf:
Linking Translations Appropriately
Bringing Authority Control to the Web
#oclcforum
Finding new efficiencies for linked data
WHAT WE'RE DOING
#oclcforum
• Working with the Library of Congress and others to finalize the BIBFRAME standard
• Beginning to explore what working with it at scale will mean
• Early 2014 BIBFRAME Implementation Testbed established
• BIBFRAME Editor
Collaborating on BIBFRAME
#oclcforum
• BIBFRAME– Data exchange in a linked data environment– Taking into account existing formats for resource description as well as
interactions with search engines– Designed as a persistent standard for library description
• OCLC Linked Data– Linked data models optimized descriptions for library resources for
discovery on Web beyond libraries– Vocabulary designed for consumption by general purpose search engines– Increased click through rates, greater visibility for libraries on the Web
Convergence of two projects/goals
#oclcforum
www.oclc.org/research/publications/reports.html
#oclcforum
• Modeling bibliographic data using Schema.org• Collaborating on expanding the Schema.org with additional
bibliographic elements at bib.schema.org• Syndicating WorldCat data to search engines using
Schema.org markup
OCLC working with the web
#oclcforum
person place
object concept
organization work
Entities of Initial Focus
work
person
#oclcforum
• Works: released 197 million work IDs for items in WorldCat
• People: 18 million entities now in progress
Producing entities at scale
From records to entities: Works
#oclcforum
• Challenge – Many identifiers lack statements about
relationships – How to determine that a dozen different
identifiers all represent same person• Outcomes
– Developed APIs to create identifiers and search using text strings
• 7 members in pilot
Personal Entity Lookup Service pilot
http://experimental.worldcat.org/xfinder/cookbookfinder.html
http://experimental.worldcat.org/xfinder/fictionfinder.html
OCLC’s linked data resources
WorldCat Catalog:15 billion triples
WorldCat Works: 5 billion RDF triples
FAST:23 million
triples
VIAF: 2 billion triples
ISNI: 10-50 million triples
Works
WHAT WE'RE WORKING ON NOW
Google Knowledge Vault
Data Sources
Knowledge Triples
Scored Triples
KnowledgeVault
EnhancedWorldCat
VIAF
FAST
Library Knowledge Vault
Extractor
Extractor
Extractor
Fusers
Extraction
Extraction Graph-based Priors
Fusion
STAY INFORMED
#oclcforum
For more information
http://www.oclc.org/blog/main/
http://www.oclc.org/research/presentations.html
http://www.oclc.org/research/publications/all.html
OCLCResearch
Thank you.
memberrelations@oclc.org
Rob Favinifavinir@oclc.org@InfoFav