Designing Data for Discovery: OCLC’s Linked Data Strategy · Designing Data for Discovery:...

Post on 26-Jun-2020

7 views 0 download

Transcript of Designing Data for Discovery: OCLC’s Linked Data Strategy · Designing Data for Discovery:...

#oclcforum

OCLC Member Forum – 2016

Designing Data for Discovery: OCLC’s Linked Data Strategy

Rob FaviniMember Relations Liaison, OCLC Membership and Research Divisionfavinir@oclc.org @InfoFav

#oclcforum

• A little linked data background• OCLC’s interest in linked data• Examples of the work that we’ve been doing• A couple experimental projects• Discussion

Today’s session

“The term Linked Data refers to a set of best practices for publishing and connecting structured data on the Web.”

What is linked data?

Heath, T., Hepp, M., and Bizer, C. (eds.). SpecialIssue on Linked Data, International Journal on Semantic Web and Information Systems(IJSWIS). http://linkeddata.org/docs/ijswis-special-issue

#oclcforum

• Data in context with a web address allowing for annotation and referencing

• Data is linked, allowing information to be combined across silos and enhanced by combination with third party data sources

• Data is accessible at a granular level over the web enabling applications to run from live data

Linked data benefits

#oclcforum

• A structured set of procedures and standards…there’s no one agreed on manual

• A library-only thing• A replacement for MARC• Fully developed• Turned on with a switch• Going to solve EVERYTHING

Linked data is not…

#oclcforum

• Evolving in the library world• Used to link related information across the Web• A tool/methodology that will someday help to make library

data more accessible by the greater Web• Happening in many places today• Agenda-free

Linked data is…

http://blog.icpl.org/2015/10/06/farewell-catalog-card/

#oclcforum

• A collection of statements…• Taken from the piece itself…• Sometimes “enhanced” with inferred parentheticals (e.g., [1975]

)…• Or additional statements not on the piece (e.g., subject headings)• Where punctuation, which may or may not be present, is used

(inconsistently) for structure• Mostly uncontrolled and only loosely connected to anything else

The classic bib record

THE PROBLEM

#oclcforum

• Identification Problems:– “The Hamlet Problem” (titles aren’t enough)– “The John Rock” (names aren’t enough)

• Linkage Problems:– “The Web Problem” (records aren’t enough, you need links)– “The Language Problem” (surfacing the right translation for a given user)

• Quality Problems:– “The Legacy Problem” (strings are not controlled terms; often, they cannot

be turned into them)

Actually, a number of problems

#oclcforum

• Identification Problems:– “The Hamlet Problem” (titles aren’t enough)– “The John Rock” (names aren’t enough)

• Linkage Problems:– “The Web Problem” (records aren’t enough, you need links)– “The Language Problem” (surfacing the right translation for a given user)

• Quality Problems:– “The Legacy Problem” (strings are not controlled terms; often, they cannot

be turned into them)

Actually, a number of problems

First, define ALL

THE THINGS

“Now! … That should clear upA few things around here!”

entity/ˈɛntɪti/noun

a thing with distinct and independent existence.

relationship/rɪˈleɪʃ(ə)nʃɪp/noun

the way in which two or more people or things are connected

RecordTitle: "War and Peace"Author: "Leo Tolstoy 1828-1910"ISBN: 0307266931

Type: WorkName: "War and Peace"Author: http://worldcat.org/entity/person/id/1234

Entity (http://worldcat.org/entity/work/id/115206288)

Type: PersonName: "Leo Tolstoy "Born: 1828Died: 1910Birthplace: http://worldcat.org/entity/place/id/8976

Entity (http://worldcat.org/entity/person/id/1234)

Type: PlaceName: "Yasnaya Polyana"SameAs: http://geonames.org/468686

Entity (http://worldcat.org/entity/place/id/8976)

person place

object concept

organization work

subjectitemavailability

author

Relationships between entities are established

“Shredding”

LINKED DATA AND OCLC

#oclcforum

OCLC-wide effort

OCLC RESEARCH PRODUCTMANAGEMENT

WORLDCAT & DATA

INFRASTRUCTURE

#oclcforum

• The end goal of metadata is discovery-delivery-fulfillment; getting stuff in users’ hands!

• Making things more Web-search ready• Recognize that search begins outside of the library• Life after MARC• Leverage rich WorldCat data

Why we’re doing it

#oclcforum

• To convert text into identifiers • To disambiguate similar names or labels• To create global identifiers for local names• To cooperatively manage new forms of metadata• Have OCLC act as hub for this metadata• Have OCLC commit to production-level services instead of

experiments

Needs of early adopters

#oclcforum

Improving the discovery experience

Title: Journey to the WestLanguage: EnglishTranslator: Anthony C. YuDate: 1977IsTranslationOf:

Title: Journey to the WestLanguage: EnglishTranslator: W. J. F. JennerDate: 1982-1984IsTranslationOf:

Title: 西遊記Language: ChineseAuthor: 吳承恩Created: 1592HasTranslation:

Title: Tây du ký bình khảoLanguage: VietnameseTranslator: Phan QuânDate: 1980IsTranslationOf:

Title: 西遊記Language: JapaneseTranslator: 中野美代子Date: 1986IsTranslationOf:

Title: PilgerfahrtLanguage: GermanTranslator: Georgette Boner Date: 1983IsTranslationOf:

Linking Translations Appropriately

Bringing Authority Control to the Web

#oclcforum

Finding new efficiencies for linked data

WHAT WE'RE DOING

#oclcforum

• Working with the Library of Congress and others to finalize the BIBFRAME standard

• Beginning to explore what working with it at scale will mean

• Early 2014 BIBFRAME Implementation Testbed established

• BIBFRAME Editor

Collaborating on BIBFRAME

#oclcforum

• BIBFRAME– Data exchange in a linked data environment– Taking into account existing formats for resource description as well as

interactions with search engines– Designed as a persistent standard for library description

• OCLC Linked Data– Linked data models optimized descriptions for library resources for

discovery on Web beyond libraries– Vocabulary designed for consumption by general purpose search engines– Increased click through rates, greater visibility for libraries on the Web

Convergence of two projects/goals

#oclcforum

www.oclc.org/research/publications/reports.html

#oclcforum

• Modeling bibliographic data using Schema.org• Collaborating on expanding the Schema.org with additional

bibliographic elements at bib.schema.org• Syndicating WorldCat data to search engines using

Schema.org markup

OCLC working with the web

#oclcforum

person place

object concept

organization work

Entities of Initial Focus

work

person

#oclcforum

• Works: released 197 million work IDs for items in WorldCat

• People: 18 million entities now in progress

Producing entities at scale

From records to entities: Works

Virtual International Authority File (VIAF) viaf.org

#oclcforum

• Challenge – Many identifiers lack statements about

relationships – How to determine that a dozen different

identifiers all represent same person• Outcomes

– Developed APIs to create identifiers and search using text strings

• 7 members in pilot

Personal Entity Lookup Service pilot

#oclcforum

WorldCat Identities www.worldcat.org/identities

http://experimental.worldcat.org/xfinder/cookbookfinder.html

http://experimental.worldcat.org/kindredworks/

http://experimental.worldcat.org/xfinder/fictionfinder.html

OCLC’s linked data resources

WorldCat Catalog:15 billion triples

WorldCat Works: 5 billion RDF triples

FAST:23 million

triples

VIAF: 2 billion triples

ISNI: 10-50 million triples

Works

WHAT WE'RE WORKING ON NOW

Google Knowledge Vault

Data Sources

Knowledge Triples

Scored Triples

KnowledgeVault

EnhancedWorldCat

VIAF

FAST

Library Knowledge Vault

Extractor

Extractor

Extractor

Fusers

Extraction

Extraction Graph-based Priors

Fusion

STAY INFORMED

#oclcforum

For more information

http://www.oclc.org/blog/main/

http://www.oclc.org/research/presentations.html

http://www.oclc.org/research/publications/all.html

OCLCResearch

Jonathan Rochkind's post on Linked Data Caution bit.ly/1R4kxyH

Thank you.

memberrelations@oclc.org

Rob Favinifavinir@oclc.org@InfoFav