Designing Data for Discovery: OCLC’s Linked Data Strategy · Designing Data for Discovery:...

66
#oclcforum OCLC Member Forum – 2016 Designing Data for Discovery: OCLC’s Linked Data Strategy Rob Favini Member Relations Liaison, OCLC Membership and Research Division [email protected] @InfoFav

Transcript of Designing Data for Discovery: OCLC’s Linked Data Strategy · Designing Data for Discovery:...

Page 1: Designing Data for Discovery: OCLC’s Linked Data Strategy · Designing Data for Discovery: OCLC’s Linked Data Strategy. Rob Favini. Member Relations Liaison, ... “The term Linked

#oclcforum

OCLC Member Forum – 2016

Designing Data for Discovery: OCLC’s Linked Data Strategy

Rob FaviniMember Relations Liaison, OCLC Membership and Research [email protected] @InfoFav

Page 2: Designing Data for Discovery: OCLC’s Linked Data Strategy · Designing Data for Discovery: OCLC’s Linked Data Strategy. Rob Favini. Member Relations Liaison, ... “The term Linked

#oclcforum

• A little linked data background• OCLC’s interest in linked data• Examples of the work that we’ve been doing• A couple experimental projects• Discussion

Today’s session

Page 3: Designing Data for Discovery: OCLC’s Linked Data Strategy · Designing Data for Discovery: OCLC’s Linked Data Strategy. Rob Favini. Member Relations Liaison, ... “The term Linked

“The term Linked Data refers to a set of best practices for publishing and connecting structured data on the Web.”

What is linked data?

Heath, T., Hepp, M., and Bizer, C. (eds.). SpecialIssue on Linked Data, International Journal on Semantic Web and Information Systems(IJSWIS). http://linkeddata.org/docs/ijswis-special-issue

Page 4: Designing Data for Discovery: OCLC’s Linked Data Strategy · Designing Data for Discovery: OCLC’s Linked Data Strategy. Rob Favini. Member Relations Liaison, ... “The term Linked

#oclcforum

• Data in context with a web address allowing for annotation and referencing

• Data is linked, allowing information to be combined across silos and enhanced by combination with third party data sources

• Data is accessible at a granular level over the web enabling applications to run from live data

Linked data benefits

Page 5: Designing Data for Discovery: OCLC’s Linked Data Strategy · Designing Data for Discovery: OCLC’s Linked Data Strategy. Rob Favini. Member Relations Liaison, ... “The term Linked
Page 6: Designing Data for Discovery: OCLC’s Linked Data Strategy · Designing Data for Discovery: OCLC’s Linked Data Strategy. Rob Favini. Member Relations Liaison, ... “The term Linked
Page 7: Designing Data for Discovery: OCLC’s Linked Data Strategy · Designing Data for Discovery: OCLC’s Linked Data Strategy. Rob Favini. Member Relations Liaison, ... “The term Linked
Page 8: Designing Data for Discovery: OCLC’s Linked Data Strategy · Designing Data for Discovery: OCLC’s Linked Data Strategy. Rob Favini. Member Relations Liaison, ... “The term Linked

#oclcforum

• A structured set of procedures and standards…there’s no one agreed on manual

• A library-only thing• A replacement for MARC• Fully developed• Turned on with a switch• Going to solve EVERYTHING

Linked data is not…

Page 9: Designing Data for Discovery: OCLC’s Linked Data Strategy · Designing Data for Discovery: OCLC’s Linked Data Strategy. Rob Favini. Member Relations Liaison, ... “The term Linked

#oclcforum

• Evolving in the library world• Used to link related information across the Web• A tool/methodology that will someday help to make library

data more accessible by the greater Web• Happening in many places today• Agenda-free

Linked data is…

Page 10: Designing Data for Discovery: OCLC’s Linked Data Strategy · Designing Data for Discovery: OCLC’s Linked Data Strategy. Rob Favini. Member Relations Liaison, ... “The term Linked
Page 11: Designing Data for Discovery: OCLC’s Linked Data Strategy · Designing Data for Discovery: OCLC’s Linked Data Strategy. Rob Favini. Member Relations Liaison, ... “The term Linked

http://blog.icpl.org/2015/10/06/farewell-catalog-card/

Page 12: Designing Data for Discovery: OCLC’s Linked Data Strategy · Designing Data for Discovery: OCLC’s Linked Data Strategy. Rob Favini. Member Relations Liaison, ... “The term Linked

#oclcforum

• A collection of statements…• Taken from the piece itself…• Sometimes “enhanced” with inferred parentheticals (e.g., [1975]

)…• Or additional statements not on the piece (e.g., subject headings)• Where punctuation, which may or may not be present, is used

(inconsistently) for structure• Mostly uncontrolled and only loosely connected to anything else

The classic bib record

Page 13: Designing Data for Discovery: OCLC’s Linked Data Strategy · Designing Data for Discovery: OCLC’s Linked Data Strategy. Rob Favini. Member Relations Liaison, ... “The term Linked
Page 14: Designing Data for Discovery: OCLC’s Linked Data Strategy · Designing Data for Discovery: OCLC’s Linked Data Strategy. Rob Favini. Member Relations Liaison, ... “The term Linked
Page 15: Designing Data for Discovery: OCLC’s Linked Data Strategy · Designing Data for Discovery: OCLC’s Linked Data Strategy. Rob Favini. Member Relations Liaison, ... “The term Linked

THE PROBLEM

Page 16: Designing Data for Discovery: OCLC’s Linked Data Strategy · Designing Data for Discovery: OCLC’s Linked Data Strategy. Rob Favini. Member Relations Liaison, ... “The term Linked

#oclcforum

• Identification Problems:– “The Hamlet Problem” (titles aren’t enough)– “The John Rock” (names aren’t enough)

• Linkage Problems:– “The Web Problem” (records aren’t enough, you need links)– “The Language Problem” (surfacing the right translation for a given user)

• Quality Problems:– “The Legacy Problem” (strings are not controlled terms; often, they cannot

be turned into them)

Actually, a number of problems

Page 17: Designing Data for Discovery: OCLC’s Linked Data Strategy · Designing Data for Discovery: OCLC’s Linked Data Strategy. Rob Favini. Member Relations Liaison, ... “The term Linked
Page 18: Designing Data for Discovery: OCLC’s Linked Data Strategy · Designing Data for Discovery: OCLC’s Linked Data Strategy. Rob Favini. Member Relations Liaison, ... “The term Linked
Page 19: Designing Data for Discovery: OCLC’s Linked Data Strategy · Designing Data for Discovery: OCLC’s Linked Data Strategy. Rob Favini. Member Relations Liaison, ... “The term Linked

#oclcforum

• Identification Problems:– “The Hamlet Problem” (titles aren’t enough)– “The John Rock” (names aren’t enough)

• Linkage Problems:– “The Web Problem” (records aren’t enough, you need links)– “The Language Problem” (surfacing the right translation for a given user)

• Quality Problems:– “The Legacy Problem” (strings are not controlled terms; often, they cannot

be turned into them)

Actually, a number of problems

Page 20: Designing Data for Discovery: OCLC’s Linked Data Strategy · Designing Data for Discovery: OCLC’s Linked Data Strategy. Rob Favini. Member Relations Liaison, ... “The term Linked

First, define ALL

THE THINGS

“Now! … That should clear upA few things around here!”

Page 21: Designing Data for Discovery: OCLC’s Linked Data Strategy · Designing Data for Discovery: OCLC’s Linked Data Strategy. Rob Favini. Member Relations Liaison, ... “The term Linked

entity/ˈɛntɪti/noun

a thing with distinct and independent existence.

relationship/rɪˈleɪʃ(ə)nʃɪp/noun

the way in which two or more people or things are connected

Page 22: Designing Data for Discovery: OCLC’s Linked Data Strategy · Designing Data for Discovery: OCLC’s Linked Data Strategy. Rob Favini. Member Relations Liaison, ... “The term Linked

RecordTitle: "War and Peace"Author: "Leo Tolstoy 1828-1910"ISBN: 0307266931

Type: WorkName: "War and Peace"Author: http://worldcat.org/entity/person/id/1234

Entity (http://worldcat.org/entity/work/id/115206288)

Type: PersonName: "Leo Tolstoy "Born: 1828Died: 1910Birthplace: http://worldcat.org/entity/place/id/8976

Entity (http://worldcat.org/entity/person/id/1234)

Type: PlaceName: "Yasnaya Polyana"SameAs: http://geonames.org/468686

Entity (http://worldcat.org/entity/place/id/8976)

Page 23: Designing Data for Discovery: OCLC’s Linked Data Strategy · Designing Data for Discovery: OCLC’s Linked Data Strategy. Rob Favini. Member Relations Liaison, ... “The term Linked

person place

object concept

organization work

subjectitemavailability

author

Relationships between entities are established

Page 24: Designing Data for Discovery: OCLC’s Linked Data Strategy · Designing Data for Discovery: OCLC’s Linked Data Strategy. Rob Favini. Member Relations Liaison, ... “The term Linked

“Shredding”

Page 25: Designing Data for Discovery: OCLC’s Linked Data Strategy · Designing Data for Discovery: OCLC’s Linked Data Strategy. Rob Favini. Member Relations Liaison, ... “The term Linked

LINKED DATA AND OCLC

Page 26: Designing Data for Discovery: OCLC’s Linked Data Strategy · Designing Data for Discovery: OCLC’s Linked Data Strategy. Rob Favini. Member Relations Liaison, ... “The term Linked

#oclcforum

OCLC-wide effort

OCLC RESEARCH PRODUCTMANAGEMENT

WORLDCAT & DATA

INFRASTRUCTURE

Page 27: Designing Data for Discovery: OCLC’s Linked Data Strategy · Designing Data for Discovery: OCLC’s Linked Data Strategy. Rob Favini. Member Relations Liaison, ... “The term Linked

#oclcforum

• The end goal of metadata is discovery-delivery-fulfillment; getting stuff in users’ hands!

• Making things more Web-search ready• Recognize that search begins outside of the library• Life after MARC• Leverage rich WorldCat data

Why we’re doing it

Page 28: Designing Data for Discovery: OCLC’s Linked Data Strategy · Designing Data for Discovery: OCLC’s Linked Data Strategy. Rob Favini. Member Relations Liaison, ... “The term Linked

#oclcforum

• To convert text into identifiers • To disambiguate similar names or labels• To create global identifiers for local names• To cooperatively manage new forms of metadata• Have OCLC act as hub for this metadata• Have OCLC commit to production-level services instead of

experiments

Needs of early adopters

Page 29: Designing Data for Discovery: OCLC’s Linked Data Strategy · Designing Data for Discovery: OCLC’s Linked Data Strategy. Rob Favini. Member Relations Liaison, ... “The term Linked
Page 30: Designing Data for Discovery: OCLC’s Linked Data Strategy · Designing Data for Discovery: OCLC’s Linked Data Strategy. Rob Favini. Member Relations Liaison, ... “The term Linked

#oclcforum

Improving the discovery experience

Page 31: Designing Data for Discovery: OCLC’s Linked Data Strategy · Designing Data for Discovery: OCLC’s Linked Data Strategy. Rob Favini. Member Relations Liaison, ... “The term Linked

Title: Journey to the WestLanguage: EnglishTranslator: Anthony C. YuDate: 1977IsTranslationOf:

Title: Journey to the WestLanguage: EnglishTranslator: W. J. F. JennerDate: 1982-1984IsTranslationOf:

Title: 西遊記Language: ChineseAuthor: 吳承恩Created: 1592HasTranslation:

Title: Tây du ký bình khảoLanguage: VietnameseTranslator: Phan QuânDate: 1980IsTranslationOf:

Title: 西遊記Language: JapaneseTranslator: 中野美代子Date: 1986IsTranslationOf:

Title: PilgerfahrtLanguage: GermanTranslator: Georgette Boner Date: 1983IsTranslationOf:

Linking Translations Appropriately

Page 32: Designing Data for Discovery: OCLC’s Linked Data Strategy · Designing Data for Discovery: OCLC’s Linked Data Strategy. Rob Favini. Member Relations Liaison, ... “The term Linked

Bringing Authority Control to the Web

Page 33: Designing Data for Discovery: OCLC’s Linked Data Strategy · Designing Data for Discovery: OCLC’s Linked Data Strategy. Rob Favini. Member Relations Liaison, ... “The term Linked

#oclcforum

Finding new efficiencies for linked data

Page 34: Designing Data for Discovery: OCLC’s Linked Data Strategy · Designing Data for Discovery: OCLC’s Linked Data Strategy. Rob Favini. Member Relations Liaison, ... “The term Linked

WHAT WE'RE DOING

Page 35: Designing Data for Discovery: OCLC’s Linked Data Strategy · Designing Data for Discovery: OCLC’s Linked Data Strategy. Rob Favini. Member Relations Liaison, ... “The term Linked
Page 36: Designing Data for Discovery: OCLC’s Linked Data Strategy · Designing Data for Discovery: OCLC’s Linked Data Strategy. Rob Favini. Member Relations Liaison, ... “The term Linked

#oclcforum

• Working with the Library of Congress and others to finalize the BIBFRAME standard

• Beginning to explore what working with it at scale will mean

• Early 2014 BIBFRAME Implementation Testbed established

• BIBFRAME Editor

Collaborating on BIBFRAME

Page 37: Designing Data for Discovery: OCLC’s Linked Data Strategy · Designing Data for Discovery: OCLC’s Linked Data Strategy. Rob Favini. Member Relations Liaison, ... “The term Linked

#oclcforum

• BIBFRAME– Data exchange in a linked data environment– Taking into account existing formats for resource description as well as

interactions with search engines– Designed as a persistent standard for library description

• OCLC Linked Data– Linked data models optimized descriptions for library resources for

discovery on Web beyond libraries– Vocabulary designed for consumption by general purpose search engines– Increased click through rates, greater visibility for libraries on the Web

Convergence of two projects/goals

Page 38: Designing Data for Discovery: OCLC’s Linked Data Strategy · Designing Data for Discovery: OCLC’s Linked Data Strategy. Rob Favini. Member Relations Liaison, ... “The term Linked

#oclcforum

www.oclc.org/research/publications/reports.html

Page 39: Designing Data for Discovery: OCLC’s Linked Data Strategy · Designing Data for Discovery: OCLC’s Linked Data Strategy. Rob Favini. Member Relations Liaison, ... “The term Linked

#oclcforum

• Modeling bibliographic data using Schema.org• Collaborating on expanding the Schema.org with additional

bibliographic elements at bib.schema.org• Syndicating WorldCat data to search engines using

Schema.org markup

OCLC working with the web

Page 40: Designing Data for Discovery: OCLC’s Linked Data Strategy · Designing Data for Discovery: OCLC’s Linked Data Strategy. Rob Favini. Member Relations Liaison, ... “The term Linked

#oclcforum

Page 41: Designing Data for Discovery: OCLC’s Linked Data Strategy · Designing Data for Discovery: OCLC’s Linked Data Strategy. Rob Favini. Member Relations Liaison, ... “The term Linked

person place

object concept

organization work

Entities of Initial Focus

work

person

Page 42: Designing Data for Discovery: OCLC’s Linked Data Strategy · Designing Data for Discovery: OCLC’s Linked Data Strategy. Rob Favini. Member Relations Liaison, ... “The term Linked
Page 43: Designing Data for Discovery: OCLC’s Linked Data Strategy · Designing Data for Discovery: OCLC’s Linked Data Strategy. Rob Favini. Member Relations Liaison, ... “The term Linked

#oclcforum

• Works: released 197 million work IDs for items in WorldCat

• People: 18 million entities now in progress

Producing entities at scale

Page 44: Designing Data for Discovery: OCLC’s Linked Data Strategy · Designing Data for Discovery: OCLC’s Linked Data Strategy. Rob Favini. Member Relations Liaison, ... “The term Linked

From records to entities: Works

Page 45: Designing Data for Discovery: OCLC’s Linked Data Strategy · Designing Data for Discovery: OCLC’s Linked Data Strategy. Rob Favini. Member Relations Liaison, ... “The term Linked
Page 46: Designing Data for Discovery: OCLC’s Linked Data Strategy · Designing Data for Discovery: OCLC’s Linked Data Strategy. Rob Favini. Member Relations Liaison, ... “The term Linked

Virtual International Authority File (VIAF) viaf.org

Page 47: Designing Data for Discovery: OCLC’s Linked Data Strategy · Designing Data for Discovery: OCLC’s Linked Data Strategy. Rob Favini. Member Relations Liaison, ... “The term Linked

#oclcforum

• Challenge – Many identifiers lack statements about

relationships – How to determine that a dozen different

identifiers all represent same person• Outcomes

– Developed APIs to create identifiers and search using text strings

• 7 members in pilot

Personal Entity Lookup Service pilot

Page 48: Designing Data for Discovery: OCLC’s Linked Data Strategy · Designing Data for Discovery: OCLC’s Linked Data Strategy. Rob Favini. Member Relations Liaison, ... “The term Linked

#oclcforum

WorldCat Identities www.worldcat.org/identities

Page 49: Designing Data for Discovery: OCLC’s Linked Data Strategy · Designing Data for Discovery: OCLC’s Linked Data Strategy. Rob Favini. Member Relations Liaison, ... “The term Linked
Page 50: Designing Data for Discovery: OCLC’s Linked Data Strategy · Designing Data for Discovery: OCLC’s Linked Data Strategy. Rob Favini. Member Relations Liaison, ... “The term Linked

http://experimental.worldcat.org/xfinder/cookbookfinder.html

Page 51: Designing Data for Discovery: OCLC’s Linked Data Strategy · Designing Data for Discovery: OCLC’s Linked Data Strategy. Rob Favini. Member Relations Liaison, ... “The term Linked

http://experimental.worldcat.org/kindredworks/

Page 52: Designing Data for Discovery: OCLC’s Linked Data Strategy · Designing Data for Discovery: OCLC’s Linked Data Strategy. Rob Favini. Member Relations Liaison, ... “The term Linked

http://experimental.worldcat.org/xfinder/fictionfinder.html

Page 53: Designing Data for Discovery: OCLC’s Linked Data Strategy · Designing Data for Discovery: OCLC’s Linked Data Strategy. Rob Favini. Member Relations Liaison, ... “The term Linked

OCLC’s linked data resources

WorldCat Catalog:15 billion triples

WorldCat Works: 5 billion RDF triples

FAST:23 million

triples

VIAF: 2 billion triples

ISNI: 10-50 million triples

Works

Page 54: Designing Data for Discovery: OCLC’s Linked Data Strategy · Designing Data for Discovery: OCLC’s Linked Data Strategy. Rob Favini. Member Relations Liaison, ... “The term Linked

WHAT WE'RE WORKING ON NOW

Page 55: Designing Data for Discovery: OCLC’s Linked Data Strategy · Designing Data for Discovery: OCLC’s Linked Data Strategy. Rob Favini. Member Relations Liaison, ... “The term Linked

Google Knowledge Vault

Page 56: Designing Data for Discovery: OCLC’s Linked Data Strategy · Designing Data for Discovery: OCLC’s Linked Data Strategy. Rob Favini. Member Relations Liaison, ... “The term Linked

Data Sources

Knowledge Triples

Scored Triples

KnowledgeVault

EnhancedWorldCat

VIAF

FAST

Library Knowledge Vault

Extractor

Extractor

Extractor

Fusers

Extraction

Extraction Graph-based Priors

Fusion

Page 57: Designing Data for Discovery: OCLC’s Linked Data Strategy · Designing Data for Discovery: OCLC’s Linked Data Strategy. Rob Favini. Member Relations Liaison, ... “The term Linked
Page 58: Designing Data for Discovery: OCLC’s Linked Data Strategy · Designing Data for Discovery: OCLC’s Linked Data Strategy. Rob Favini. Member Relations Liaison, ... “The term Linked
Page 59: Designing Data for Discovery: OCLC’s Linked Data Strategy · Designing Data for Discovery: OCLC’s Linked Data Strategy. Rob Favini. Member Relations Liaison, ... “The term Linked

STAY INFORMED

Page 60: Designing Data for Discovery: OCLC’s Linked Data Strategy · Designing Data for Discovery: OCLC’s Linked Data Strategy. Rob Favini. Member Relations Liaison, ... “The term Linked

#oclcforum

For more information

Page 61: Designing Data for Discovery: OCLC’s Linked Data Strategy · Designing Data for Discovery: OCLC’s Linked Data Strategy. Rob Favini. Member Relations Liaison, ... “The term Linked

http://www.oclc.org/blog/main/

Page 62: Designing Data for Discovery: OCLC’s Linked Data Strategy · Designing Data for Discovery: OCLC’s Linked Data Strategy. Rob Favini. Member Relations Liaison, ... “The term Linked

http://www.oclc.org/research/presentations.html

Page 63: Designing Data for Discovery: OCLC’s Linked Data Strategy · Designing Data for Discovery: OCLC’s Linked Data Strategy. Rob Favini. Member Relations Liaison, ... “The term Linked

http://www.oclc.org/research/publications/all.html

Page 64: Designing Data for Discovery: OCLC’s Linked Data Strategy · Designing Data for Discovery: OCLC’s Linked Data Strategy. Rob Favini. Member Relations Liaison, ... “The term Linked

OCLCResearch

Page 65: Designing Data for Discovery: OCLC’s Linked Data Strategy · Designing Data for Discovery: OCLC’s Linked Data Strategy. Rob Favini. Member Relations Liaison, ... “The term Linked

Jonathan Rochkind's post on Linked Data Caution bit.ly/1R4kxyH