Post on 01-May-2020
LINKED DATA: OCLC
OFFERINGS AND
EXPERIMENTS
OCLC Informationstag 2012 – Frankfurt
Titia van der WerfSenior Program OfficerOCLC Research
OCLC Research
1
CONTENT
�Digital library innovation in Europe
�Role of OCLC Research
�OCLC Research Linked Data Activity
�Discussion
2
OCLC Research
DIGITAL LIBRARY INNOVATION IN
EUROPE
Innovation at the European level: EC ICT-FP
research projects for digital libraries
� High investment in innovation
� Building an EU-wide network of experts
� Gap between academic research and practical
implementation of research outcomes
� High project management overhead
� Driven by EU-policy and politics
� Mutual competition and temporary alliances – no long-
term/self-sustainable cooperation in innovation. 3
OCLC Research
DIGITAL LIBRARY INNOVATION IN
EUROPE
Innovation at the national level:
� Funds managed by national agencies like JISC, DFG,
SURF, etc.
� Funding of institutions with national tasks and
responsibilities
� Driven by national policy and politics
� Mutual competition and temporary alliances – no long-
term/self-sustainable cooperation in innovation.
4
OCLC Research
DIGITAL LIBRARY INNOVATION IN
EUROPE
Innovation at the institutional level:
� Complex ecosystem to manage
5
OCLC Research
ECOSYSTEM OF A SINGLE LIBRARY
TODAY
6
OCLC Research
Users
Vendors
LibraryOPAC
ILS
Circulation
Cataloging
Self
Service
Acquisitions
Cataloging
Utility
National/
Global
System
Consortial
System
Electronic
Vendor
A to Z
List
Resolver
ERM
Institutional
Repository
Meta-
search
DIGITAL LIBRARY INNOVATION IN
EUROPE
Innovation at the institutional level:
� Complex ecosystem to manage (previous slide)
� Wide range of staff expertise/skills required to keep
this ecosystem up and running – local capacities are
overstretched
� Fragmention of innovation effort
� Budget constraints aggravated by economic crisis
� Need to evolve towards a model of shared innovation
� Need to redefine boundaries of local/shared/external
ecosystem
7
OCLC Research
BOUNDARIES: THINGS ONLY WE CAN
DO, THINGS WE CAN DO TOGETHER,
THINGS WE SHOULDN’T HAVE TO DO
ANYMORE
8
OCLC Research
Engagement Innovation
Infrastructure
Develop new services
Take-up new technologies
Speed/flexibility important
Attract and build
relationships with
end-user,
“service-oriented”,
customization, etc.
Back office capacities that
support day-to-day operations
“Routinized” workflows
Economies of scale important
SHARED OCLC PLATFORM
9
OCLC Research
Platform
Management
Platform
Management
Infrastructure
Data
Web Services
App GalleryApp Gallery
OCLC-built
Applications
OCLC-built
ApplicationsLibrary-built
Applications
Library-built
ApplicationsPartner-built
Applications
Partner-built
ApplicationsFlexible, open
platform for the
community to
share
applications and
innovation
OBJECTIVE AND ROLES OF OCLC
RESEARCH
� Objective
To expand knowledge that advances the OCLC’s public
purposes of furthering access to the world’s information
and reducing library costs.
� Roles
1. Act as a community resource for shared R&D
2. Advance ideas for service improvement, new
services and technology adoption within OCLC
product groups
3. Engage the library and archive community through
research focused working groups and collaborations.
12
OCLC Research
OCLC – 3 CONSTITUENCIES
13
OCLC Research
OCLC
Services
OCLC Research
Library Partnership
OCLC Membership
THE OCLC RESEARCH PROCES
15
OCLC Research
BUILD
COMMUNITY
CREATE
CONSENSUS
IDENTIFY BEST
PRACTICE
PERFORM RESEARCH &
BUSINESS
INTELLIGENCE
PRODUCE
OUTCOMES
TRANSFER
TECHNOLOGY
DEVELOP &
DEPLOY
BUILD
PROTOTYPES
CONVENE
EXPERTS
DEVELOP
ARCHITECTURE & STANDARDS
Shared uncertainties Community solutions
LINKED DATA
� Global nodes for referencing:
� information about entities (persons, organisations,
books, events, geographical locations, plants and
crops, etc.)
� identified on the web with an HTTP URI;
� The same entity can be described by different
URIs:
� http://viaf.org/viaf/102333412 identifies the same
person as http://dbpedia.org/resource/Jane_Austen,
without claiming that the information about this
person is the same.19
OCLC Research
LINKED DATA
� Linked data are published as a dataset using the
web standards for browsing (human readable) and for
re-use across applications (machine readable: RDF,
SPARQL).
� Linked data usually include links to other URIs
that are relevant (eg. descriptions of the same entity
in other languages).
20
OCLC Research
LINKED DATA
� Library Linked Data Incubator Group http://www.w3.org/2005/Incubator/lld/XGR-lld-usecase/
� Use cases for semantic web technology
implementation in libraries.
� To demonstrate the benefit of linked data for library
resources and the value of sharing these descriptions
among libraries and beyond. 21
Make your stuff available on the web.
Make it available as structured data…
…in a non-proprietary format.
Use HTTP URIs to identify things.
Link your data to other people’s data. Sourc
e:
W3C
LINKED DATA INITIATIVES
� Dbpedia: http://en.wikipedia.org/wiki/Dbpedia
A community effort to extract structured information
from Wikipedia and to make this information
available as linked data. Dbpedia dataset describes
more than 3.64 million things.
� Linking Open Government Data http://logd.tw.rpi.edu/home
Instance Hub Project: demos of linked data across
diverse categories derived from US Government
data (US agencies, crops, toxic chemicals, etc.).
� BBC Sport: http://www.bbc.co.uk/blogs/bbcinternet/2012/04/sports_dyn
amic_semantic.html
Linked data of sports (players, teams, matches,
leagues and divisions, events and competitions)
22
OCLC Research
OCLC LINKED DATA ACTIVITY
� Which entities in bibliographic descriptions are
useful to publish as linked data?
Authorities
VIAF: combines multiple name authority records into a
single name cluster and identifies it with one URI.
VIAF initiative started in 2003 with LoC, DNB, BnF,
OCLC Research.
2012: 9M name clusters; cooperation with 22 agencies
from 19 countries; transitioned to production.23
OCLC Research
OCLC LINKED DATA ACTIVITY
� Which entities in bibliographic descriptions are
useful to publish as linked data?
Subject headings
FAST: simplified and faceted version of the LCSH.
FAST as Linked Data is available at
http://id.worldcat.org/fast/), and is made available
under the Open Data Commons Attribution Licence.
It is also available for download under this license at
http://www.oclc.org/research/activities/fast/download.
htm. 25
OCLC Research
MOVING BEYOND MARC: THE QUALITY CHALLENGE
� OCLC Data Architecture Group
Bibliographic records contain an enormous amount of
information that needs to be prepared for a linked-data
representation.
Need to manage entities and relationships across
the data in individual datastores; synthesize and extract
assertions about entities.
� Karen Coyle argues that we need to start a program to
make our data linkable. Regardless of what happens to
the linked-data ambition, we need to do this work
anyway to make our data more machine-processable.
2012. Taking library data from here to there.
http://lists.w3.org/Archives/Public/public-esw-
thes/2012Feb/0001.html 26
OCLC Research
FUTURE EXPERIMENTS
� Expose (parts of) WorldCat as linked data
� Which model? BL-model; Europeana model;
Schema.org model; etc.
� Populating discovery services with OCLC linked
data
� Europeana
� DPLA
� Other interested parties
� What will be the use of OCLC linked data by the
community? what will be the take-up? how will
they make use of the data? 28
OCLC Research
DATA LICENSING ISSUES
WorldCat record use policy
New policy created by members, reviewed by the
community, and implemented 1 August 2010:
http://www.oclc.org/worldcat/recorduse/default.htm
� Scope = copies of WorldCat records – not the WorldCat
database itself
� A code of good practice for members of a cooperative
based on shared values, trust and reciprocity in
understanding rights and responsibilities –instead of
data ownership, detailed provisions or restrictions
� Outlines rights to transfer data to individuals,
consortia and public agencies, other libraries and
scholarly institutions, (members or non-members ), and
third parties 29
OCLC Research
OCLC: OWNED AND GOVERNED BY
MEMBER LIBRARIES
30
OCLC Research
25,900+
institutions
Members
3
councils
Regional Councils Global Council
48
members16
trustees
Board of Trustees
16
trustees
DATA LICENSING ISSUES
Recommendation for adopting ODC-BY
Open data commons attribution license.
� Experimenting with Cambridge (http://cul-
comet.blogspot.com/)
� Best practice: give attribution to OCLC (somewhere on
your site) and show awareness of the OCLC cooperative
community norms and provide a link to these norms:
http://www.oclc.org/worldcat/recorduse/policy/odcbynorm
s.htm.
� Recommendation discussed with the Global
Council (April 2012);
� Appointment of Richard Wallis to help solve
remaining issues and shape OCLC’s approach to data
sharing. 31
OCLC Research
DISCUSSION
� Continue to involve the library community in
further research:
� the future of cataloguing: using linked data from the
start
� citations: using linked data from the start
� help build a higher quality data service
� Uses that might be made of OCLC datasets that
will be released
� Deduplication/disambiguation (metadata
aggregations; IRs)
� Mapping scholarly debate32
OCLC Research