AIB CILW 2016 Conference, Rome
October 21, 2016
Because the web of data
doesn’t organize itselfOCLC Research’s contributions to
linked data in the library community
Titia van der Werf
Senior Program Officer
Web of Documents
• Web pages or other
documents
• Human-readable
text
• Independent
• Static
Web of Data
• Statements about
entities, or ‘Things’
• Machine-processable
data
• Integrated
• Actionable
The two models of the Web
An example: a Knowledge Card
Albert Einstein
Person
Relativity: The Special and General Theory
Work
Physics
Subject
author
about
Entities and relationships
https://www.wikidata.org/wiki/Q937 and http://viaf.org/viaf/75121530
Wikidata and VIAF
http://experiment.worldcat.org/entity/work/data/369081611
WorldCat Works
http://id.loc.gov/authorities/subjects/sh85101653.html
Library of Congress Subject Headings
author
about
…linked for machine understanding
THE OCLC RESEARCH
INTERNATIONAL LINKED DATA
SURVEYS FOR IMPLEMENTERSKAREN SMITH-YOSHIMURA
Geographic breakdown of 90 responding institutions
20 countries
represented
0 5 10 15 20 25 30 35 40 45
USA
Spain
UK
The Netherlands
Norway
Canada
Australia
France
Germany
Italy
Switzerland
Austria
Czech Republic
Hungary
Ireland
Japan
Malaysia
Portugal
Singapore
Sweden
Linked Data Survey Respondents
Academic library
National library
Network
Government
Scholarly
Public Library
Museum
Other
31%
20%14%
10%
8%
7%4% 6%
2015 responding institutions by type
What is published as linked data
0 10 20 30 40 50 60
Authority files
Bibliographic data
Data about musuem objects
Datasets
Descriptive metadata
Digital collections
Encoded archival descriptions
Geographic data
Ontologies/vocabularies
Other
• Steep learning curve for staff
• Inconsistent legacy data
• Difficulties in
– selecting appropriate ontologies to model
data
– establishing links
• Little documentation or advice on how to build
the systems
Barriers to publishing linked data
VIAF
DBpedia
GeoNames
id.loc.gov
“Resources we convert to linked data ourselves”
Getty's Art and Architecture Thesaurus
FAST (Faceted Application of Subject Terminology)
WorldCat.org
data.bnf.fr
Deutsche National Bib Linked Data Service
2015 linked data resources most
consumed
DBpedia
Libraries, publishing
Life sciences
Social networking
Government
• Unreliable quality of published linked data
– not always reusable
– lack of authority control or URIs
– stale or obsolete datasets
• Difficulty understanding its structure and meaning
• Matching, disambiguating, and aligning locally produced
data with third-party resources
• Mapping vocabulary
• Size of RDF datasets—too large or small
Barriers to consuming linked data
Maturity
Analysis
Implementation
• Expose our data to larger Web audience
• Demonstrate what can be done
• Heard about it and wanted to try it
• Improve SEO
• Create a richer user experience
• Enhance our own data
• Improve internal metadata management
• Achieve greater accuracy and scope in search results
• Experiment with data integration
Publishing
Consuming
Both
Reasons for publishing and consuming linked data
OCLC RESEARCH’S CONTRIBUTIONS
WorldCat growth since 1998
39 41 44 47 50 52 55 61 67
86
108
139
197
236
264
0
50
100
150
200
250
1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012
Millions of records
As of 27 April 2012
In aggregations:
• data lose their local context
• data get lost in the bigger context
Making sense of data at the aggregate level:
• FRBR
• GLIMIR
• VIAF
• FAST
• Mining for entities/names
Aggregating data
Manifestations
Reproductions
Translations
Works
FRBRisation of WorldCat: 2006 - now
GLIMIR:
Clustering
records which
differ in
language and
cataloguing
rules
2014: 197 million bibliographic work descriptions available as Linked Data
VIAF
Virtual International Authority File
• Merge of 24+ national level authority files
• Cooperative program run by OCLC
• Initiated by LoC, DNB, BnF and OCLC
• 29 million authority records
• 112 million bibliographic records
• Migrated from an OCLC Research project to an
OCLC service in 2012
• VIAF is available as linked data
OCLC’s linked data resources
WorldCat Catalog
WorldCat Works
FAST
VIAF
ISNI
The EntityJS explorer
Show related entities
WHAT WE’VE LEARNED
Linked data in the library community:
Where the effort is focused
Data publishing
Data consumption
Application development
?
Why linked data?
Replicate existing library
functions more cheaply and
efficiently
Improve data integration
A better user
experience
Greater Web
visibility
Develop better models of
resources not well served by
current standards
Improve internal data
management
Library linked data is not…
A silver bullet
A killer app
A panacea
The result of cumulative and joint effort
But it is...
SM
Together we make breakthroughs possible.
Acknowledgements
Jean Godby
AIB CILW 2016 Conference, Rome - October 21, 2016
Karen Smith-Yoshimura
SM
Together we make breakthroughs possible.
Comments?
Titia van der Werf
AIB CILW 2016 Conference, Rome - October 21, 2016
Top Related