Providing Semantic and Bibliographic Data for
Library Discovery
Cathy Dolbear
Senior Link Architect, Data Strategy
Oxford University Press
Semantic and Bibliographic Data for Library Discovery / Cathy Dolbear 16th October 2015
Overview
2
• Introduction to OUP
• Our relationship with libraries as a publisher
• Industry trends for publication and delivery of metadata
• Semantic versus Bibliographic Metadata
• Where next?
Semantic and Bibliographic Data for Library Discovery / Cathy Dolbear 16th October 2015
Introduction to OUP
3
Meet the Press…
Semantic and Bibliographic Data for Library Discovery / Cathy Dolbear 16th October 2015
How library users find our content
4 Semantic and Bibliographic Data for Library Discovery / Cathy Dolbear 16th October 2015
search engines, specialised databases
Academic
Catalogue
(print +
online)
Journals
Online
Product
s Biblio-graphic
OUP
Other publishers
Semantic
(“authorities”)
Library
Management
System
User
Discovery
Services
Our relationship with libraries
5
• Customers
– but not users
– make purchasing decisions based on metadata-driven usage statistics
• Discovery portals
– Discovery services (ProQuest, ExLibris, OCLC etc) => XML metadata feeds
• But…
– Main referrers are search engines (Google/Scholar, Bing, Yahoo!) => GS
markup/RDFa/JSON-LD
– Users arrive via direct links (NIH PubMed,
escardio) => entity recognition
Semantic and Bibliographic Data for Library Discovery / Cathy Dolbear 16th October 2015
Customers; Discovery Portals
Discovery data
6
• Entry Referrers for a particular university consortium
Semantic and Bibliographic Data for Library Discovery / Cathy Dolbear 16th October 2015
Many ways to slice the pie
How should we provide our metadata?
7
Industry trends for publication and delivery
• “Push” - direct metadata deliveries
• “Pull” – metadata publishing
– Linked Data Publishing platform?
– .
– .U
Semantic and Bibliographic Data for Library Discovery / Cathy Dolbear 16th October 2015
OxMetaML
OAI-PMH:
MARC 21
Dublin Core
JSON-LD
• Publishers can’t just choose a single vocabulary/format
– Too many differing requirements/options
– Transform on delivery as required
• Simplify our internal metadata format
– Bibliographic & semantic information only
– Removing processing instructions
– Clear semantics makes linking/integration easier
– More likely we can publish our metadata without transformation
Decentralised web world: no single “standard” vocabulary
How should we provide our metadata?
8
Industry trends for publication and delivery
Semantic and Bibliographic Data for Library Discovery / Cathy Dolbear 16th October 2015
What information do we provide?
Bibliographic versus Semantic metadata
• Bibliographic information (author, title, ISBN etc)
• Semantic or contextual information - what the document is
about (academic subject, person, organisation etc)
Which vocabularies/ontologies?
10
so many standards to choose from…
• Semantic data integration is not straightforward
Semantic and Bibliographic Data for Library Discovery / Cathy Dolbear 16th October 2015
Why didn’t you use …?
11
[insert name of favourite vocabulary here]
• Semantic data integration is not straightforward
Semantic and Bibliographic Data for Library Discovery / Cathy Dolbear 16th October 2015
frbr:
Corporate
Body
frbr:
Person
frbr:Object
Why didn’t you use …?
12
[insert name of favourite vocabulary here]
• Semantic data integration is not straightforward
Semantic and Bibliographic Data for Library Discovery / Cathy Dolbear 16th October 2015
frbr:
Work
Metadata publishing
13
Embedded markup in HTML
• Google Scholar meta-tags
– HighWire Press/PRISM tagged bibliographic data
– Full text indexed (unlike Google)
• RDFa (RDF in attributes)
– Currently published on our non-journals online products
– Using schema.org vocabulary
Semantic and Bibliographic Data for Library Discovery / Cathy Dolbear 16th October 2015
─ RDFa distillers can scrape the
metadata
─ Only in HTML header => not fully
recognised by Google
Metadata publishing
14
JSON-LD
• Our RDFa not fully recognised by Google
– at the document, not object level
• Still want structured markup
– Improves click-through rate (30% reported by BestBuy)
– Search results more eye-catching as rich snippets
– Increases traffic (BBC reported 20%)
– Content indexed better
• Developing solution using JSON-LD
Semantic and Bibliographic Data for Library Discovery / Cathy Dolbear 16th October 2015
JSON-LD
Metadata publishing
15
Java Script Object Notation for Linked Data
Semantic and Bibliographic Data for Library Discovery / Cathy Dolbear 16th October 2015
<script type="application/ld+json">
{
"@context": "http://schema.org",
"@type": “MedicalScholarlyArticle",
“publicationType: “V03.200” ,
“doi": "10.1093/brain/awl303”,
“author": {
“@type”: “Person”,
“affiliation”: “University of Wurzburg, Department of Neuroradiology, Wurzburg”,
“name”: “Andreas J Bartsch“
},
“keywords”: “alcoholism, morphometry, MR spectroscopy, SIENA, voxelwise SIENA statistics”,
“about": {
"@type": "MedicalCondition",
"name": “Alcoholism”,
"code": {
"@id": "http://www.ncbi.nlm.nih.gov/mesh/C25.775.100.250",
"@type": "MedicalCode",
"code": "C25.775.100.250",
"codingSystem": "MeSH"
},
}
}
</script>
Metadata publishing
16
Java Script Object Notation for Linked Data
Semantic and Bibliographic Data for Library Discovery / Cathy Dolbear 16th October 2015
<script type="application/ld+json">
{
"@context": "http://schema.org",
"@type": “MedicalScholarlyArticle",
“publicationType: “V03.200” ,
“doi": "10.1093/brain/awl303”,
“author": {
“@type”: “Person”,
“affiliation”: “University of Wurzburg, Department of Neuroradiology, Wurzburg”,
“name”: “Andreas J Bartsch“
},
“keywords”: “alcoholism, morphometry, MR spectroscopy, SIENA, voxelwise SIENA statistics”,
“about": {
"@type": "MedicalCondition",
"name": “Alcoholism”,
"code": {
"@id": "http://www.ncbi.nlm.nih.gov/mesh/C25.775.100.250",
"@type": "MedicalCode",
"code": "C25.775.100.250",
"codingSystem": "MeSH"
},
}
}
</script>
"@context": "http://schema.org",
"@type": “MedicalScholarlyArticle",
Metadata publishing
17
Java Script Object Notation for Linked Data
Semantic and Bibliographic Data for Library Discovery / Cathy Dolbear 16th October 2015
<script type="application/ld+json">
{
"@context": "http://schema.org",
"@type": “MedicalScholarlyArticle",
“publicationType: “V03.200” ,
“doi": "10.1093/brain/awl303”,
“author": {
“@type”: “Person”,
“affiliation”: “University of Wurzburg, Department of Neuroradiology, Wurzburg”,
“name”: “Andreas J Bartsch“
},
“keywords”: “alcoholism, morphometry, MR spectroscopy, SIENA, voxelwise SIENA statistics”,
“about": {
"@type": "MedicalCondition",
"name": “Alcoholism”,
"code": {
"@id": "http://www.ncbi.nlm.nih.gov/mesh/C25.775.100.250",
"@type": "MedicalCode",
"code": "C25.775.100.250",
"codingSystem": "MeSH"
},
}
}
</script>
“publicationType: “V03.200” ,
“doi": "10.1093/brain/awl303”,
Metadata publishing
18
Java Script Object Notation for Linked Data
Semantic and Bibliographic Data for Library Discovery / Cathy Dolbear 16th October 2015
<script type="application/ld+json">
{
"@context": "http://schema.org",
"@type": “MedicalScholarlyArticle",
“publicationType: “V03.200” ,
“doi": "10.1093/brain/awl303”,
“author": {
“@type”: “Person”,
“affiliation”: “University of Wurzburg, Department of Neuroradiology, Wurzburg”,
“name”: “Andreas J Bartsch“
},
“keywords”: “alcoholism, morphometry, MR spectroscopy, SIENA, voxelwise SIENA statistics”,
“about": {
"@type": "MedicalCondition",
"name": “Alcoholism”,
"code": {
"@id": "http://www.ncbi.nlm.nih.gov/mesh/C25.775.100.250",
"@type": "MedicalCode",
"code": "C25.775.100.250",
"codingSystem": "MeSH"
},
}
}
</script>
“author": {
“@type”: “Person”,
“affiliation”: “University of Wurzburg”,
“name”: “Andreas J Bartsch“
},
Metadata publishing
19
Java Script Object Notation for Linked Data
Semantic and Bibliographic Data for Library Discovery / Cathy Dolbear 16th October 2015
<script type="application/ld+json">
{
"@context": "http://schema.org",
"@type": “MedicalScholarlyArticle",
“publicationType: “V03.200” ,
“doi": "10.1093/brain/awl303”,
“author": {
“@type”: “Person”,
“affiliation”: “University of Wurzburg, Department of Neuroradiology, Wurzburg”,
“name”: “Andreas J Bartsch“
},
“keywords”: “alcoholism, morphometry, MR spectroscopy, SIENA, voxelwise SIENA statistics”,
“about": {
"@type": "MedicalCondition",
"name": “Alcoholism”,
"code": {
"@id": "http://www.ncbi.nlm.nih.gov/mesh/C25.775.100.250",
"@type": "MedicalCode",
"code": "C25.775.100.250",
"codingSystem": "MeSH"
},
}
}
</script>
“about": {
"@type": "MedicalCondition",
"name": “Alcoholism”,
"code": {…}
}
Entity recognition
20
• Wikidata have aligned Dictionary of National Biography people
to VIAF
• Semantic enrichment programme underway, starting with
medical entities – tagged with UMLS codes
Semantic and Bibliographic Data for Library Discovery / Cathy Dolbear 16th October 2015
in-line content markup
Where next?
21
Mainstream consumption of bibliographic linked data
• Publisher-supplied metadata
– Simple, clean semantic and bibliographic data model
– Output to multiple standards/formats in the interim
– Increase tagging of our content/ entity linking
– Providing semantic disambiguation
• Requirements mainly driven by web search engines so far
• If we publish linked data, will it be incorporated into library
search and indexing systems?
Semantic and Bibliographic Data for Library Discovery / Cathy Dolbear 16th October 2015
22
Any Questions?
Semantic and Bibliographic Data for Library Discovery / Cathy Dolbear 16th October 2015
Top Related