Linking the Data: Building effective Authority and …Linking the Data: Building Effective Authority...
Transcript of Linking the Data: Building effective Authority and …Linking the Data: Building Effective Authority...
Linking the Data
Building Effective Authority and
Identity Lookup
Huda Khan and E Lynette Rayle
Cornell University
Collaborators
Dave Eichmann (University of Iowa)
Simeon Warner and Dean Krafft (Cornell)
December 6 2017 Linked Data for Libraries - Labs
Overview
bull Background and Motivation
bull Examples
bull VitroLib
bull Hyrax
bull Architecture overview
bull Future work
bull Questions
Background
bull Mellon Foundation-funded LD4 Projects
bull Transition library systems to linked data
bull Link better explore better
bull Flat record -gt Discrete entities with well-defined relationships
bull String identifiers -gt URIs
bull Relationships with other linked data
Background
4
Made in
America
1980
Made in
America
Blues
Brothers
Made in
America 1980 Blues
Brothers
Blues
Brothers
MARC
RECORD
NAME
AUTH
FILE
WORK
INSTANCE
AGENT
RWO
BIBFRAME
BIBLIOTEK-O
ENTITIES
WITH URIS
Background
ldquoA cataloger is an individual responsible for the processes of description subject analysis classification and authority control of library materials Catalogers serve as the lsquofoundation of all library service as they are the ones who organize information in such a way as to make it easily accessiblersquordquo (Emphasis mine)
From httpsenwikipediaorgwikiCataloging
Background
bull Traditional practices Authority File
bull Eg Name Authority Files Subject Headings Genre Forms from LOC
bull String as unique identifier eg ldquoMark Twainrdquo
bull Tasks and workflows
bull Identification ldquoAboutnessrdquo
bull Disambiguation
bull Context and original authority record
Background
bull Goals Design and architecture around accessing authorities
bull VitroLib
bull Prototype cataloging editor
bull Createsuses linked data
bull Enables lookup and use of authorities
bull Hyrax
bull Samvera technology stack
bull Incorporate authorities into institutional repository records
8
VitroLib Demo
9
What just happened
Questioning Authority
MAGIC (To Be Explained)
VitroLib Search Service
LOC Genre Forms
Search LOC Genre Form
data
Query = animation
Translate to QA Service
Request
urihttpidlocgovgf2011026141
label ldquoClay animation television programsrdquo
context
ldquoAlternate Labelrdquo [
ldquoClaymation television programsrdquo
ldquoSculptmation television programsrdquo
] hellip
urihttpidlocgovgf2011026141
label ldquoClay animation television programsrdquo
altLabelList [
ldquoClaymation television programsrdquo
ldquoSculptmation television programsrdquo
] hellip
23
Hyrax Demo
Autocomplete Saving String and URI
Authority OCLC FAST Subauthority PersonName
Selected String and URI
Saves both string and URI
Selecting a Term using
Lookup with Context
26
Selecting a Term using
Lookup with Context
27
Getting more from the same authority
Getting more from other authorities
30
Architecture
Technical Motivation
bull Linked data provideshellip
bull URIs that identify specific terms (as opposed to ambiguity of using
strings)
bull Reconciliation to relate terms that are defined in separate authorities
bull Goals of implementationhellip
bull Provide a single process to access many authorities
bull Provide efficient and reliable access to authorities
bull Provide a means for disambiguation that empowers library staff to
make the most accurate selections
First Set of Challenges
1 Finding Documentation
2 Linked Data Access API eg no support partial support requires login credentials
sparql query endpoint only
3 Varying Results Formats eg rdf-xml json-ld turtle n-triples etc
4 Varying Ontologies eg SKOS schemaorg madsrdf dbpedia geonames
Multi-Server Architecture
QA ndash normalize RDF returned from an authority
Multi-Server Architecture
QA ndash normalize RDF returned from an authority
Direct Access
of External
Authority
HyraxVitrolib ndash UI for selecting an entry from an authority
Multi-Server Architecture
QA ndash normalize RDF returned from an authority
httplocalhost3000qasearchlinked_data
oclc_fastpersonal_nameq=twainamp
maximumRecords=2
Direct Access
of External
Authority
HyraxVitrolib ndash UI for selecting an entry from an authority
Multi-Server Architecture
QA ndash normalize RDF returned from an authority
httplocalhost3000qasearchlinked_data
oclc_fastpersonal_nameq=twainamp
maximumRecords=2
httpexperimentalworldcatorgfast
searchquery=oclcpersonalName+22twain22
ampsortKeys=usageampmaximumRecords=2
Direct Access
of External
Authority
HyraxVitrolib ndash UI for selecting an entry from an authority
Multi-Server Architecture
QA ndash normalize RDF returned from an authority
httplocalhost3000qasearchlinked_data
oclc_fastpersonal_nameq=twainamp
maximumRecords=2
httpexperimentalworldcatorgfast
searchquery=oclcpersonalName+22twain22
ampsortKeys=usageampmaximumRecords=2
lthttpidworldcatorgfast31622gt
a schemaPerson
dctermsidentifier 31622
skosprefLabel Twain Mark 1835-1910
skosaltLabel Make Teviin 1835-1910
Make Tuwen 1835-1910
lthttpidworldcatorgfast365563gt
a schemaPerson
dctermsidentifier 365563
skosprefLabel Twain Shania
skosaltLabel Twain Eilleen
Edwards Eilleen
Direct Access
of External
Authority
HyraxVitrolib ndash UI for selecting an entry from an authority
Multi-Server Architecture
QA ndash normalize RDF returned from an authority
httplocalhost3000qasearchlinked_data
oclc_fastpersonal_nameq=twainamp
maximumRecords=2
[urihttpidworldcatorgfast31622
id31622 labelTwain Mark 1835-1910
urihttpidworldcatorgfast365563
id365563labelTwain Shania ]
httpexperimentalworldcatorgfast
searchquery=oclcpersonalName+22twain22
ampsortKeys=usageampmaximumRecords=2
lthttpidworldcatorgfast31622gt
a schemaPerson
dctermsidentifier 31622
skosprefLabel Twain Mark 1835-1910
skosaltLabel Make Teviin 1835-1910
Make Tuwen 1835-1910
lthttpidworldcatorgfast365563gt
a schemaPerson
dctermsidentifier 365563
skosprefLabel Twain Shania
skosaltLabel Twain Eilleen
Edwards Eilleen
Direct Access
of External
Authority
HyraxVitrolib ndash UI for selecting an entry from an authority
Direct Access Query API
Direct against authorityhellip
httpexperimentalworldcatorgfastsearch
query=oclcpersonalName+22twain22
ampmaximumRecords=2
httpapigeonamesorgsearch
q=ithaca
ampmaxRows=2
ampusername=demo
amptype=rdf
httpartemideartuniroma2it8081agrovocrestv1search
query=milk
amplang=en
ampmaxhits=2
Normalized Query API
Through QA normalization layerhellip
httplocalhost3000qasearchlinked_dataoclc_fast
q=twain
ampmaxRecords=2
httplocalhost3000qasearchlinked_datageonames
q=ithaca
ampmaxRecords=2
httplocalhost3000qasearchlinked_dataagrovoc
q=milk
ampmaxRecords=2
amplang=en
Normalized Results
[urihttpidworldcatorgfast31622 id31622 labelTwain Mark 1835-1910 urihttpidworldcatorgfast365563 id365563 labelTwain Shania]
[uri httpswsgeonamesorg2162552 id httpswsgeonamesorg2162552 label Ithaca (AU) uri httpswsgeonamesorg4515289 id httpswsgeonamesorg4515289 label Ithaca (US)]
[uri httpaimsfaoorgaosagrovocc_8602 id httpaimsfaoorgaosagrovocc_8602 label acidophilus milk uri httpaimsfaoorgaosagrovocc_16076 id httpaimsfaoorgaosagrovocc_16076 label buffalo milk]
OCLC FAST GeoNames AgroVoc
Second Set of Challenges
5 Reliability amp Efficiency eg server uptime server load
6 Accuracy eg select results based on usage data lexical match
custom weighting other
7 Order Ranking eg How to order a graph
Cache Server Query Process
JSP Query API
Jena-Fuseki
Triplestore
One full setup per authority
LuceneSOLR
Index
Cache Server Query Process
JSP Query API
httpservicesld4lorgld4l_servicesloc_name_batchjspquery=ezra20cornellampmaxRecords=10
Jena-Fuseki
Triplestore
LuceneSOLR
Index
One full setup per authority
Cache Server Query Process
JSP Query API
lucene search for ezra cornell
index built with predicate values
ltskosprefLabelgt
ltskosaltLabelgt
Jena-Fuseki
Triplestore
LuceneSOLR
Index
httpservicesld4lorgld4l_servicesloc_name_batchjspquery=ezra20cornellampmaxRecords=10
One full setup per authority
Cache Server Query Process
JSP Query API
extract search rank
extract URI
Jena-Fuseki
Triplestore
for each result
LuceneSOLR
Index
lucene search for ezra cornell
index built with predicate values
ltskosprefLabelgt
ltskosaltLabelgt
httpservicesld4lorgld4l_servicesloc_name_batchjspquery=ezra20cornellampmaxRecords=10
One full setup per authority
Cache Server Query Process
JSP Query API
sparql query for URI
Jena-Fuseki
Triplestore
LuceneSOLR
Index
extract search rank
extract URI
for each result
lucene search for ezra cornell
index built with predicate values
ltskosprefLabelgt
ltskosaltLabelgt
httpservicesld4lorgld4l_servicesloc_name_batchjspquery=ezra20cornellampmaxRecords=10
One full setup per authority
Cache Server Query Process
JSP Query API
combine all results
Jena-Fuseki
Triplestore
insert search rank in predicate
lthttpvivoweborgontology
corerankgt
LuceneSOLR
Index
sparql query for URI extract search rank
extract URI
for each result
lucene search for ezra cornell
index built with predicate values
ltskosprefLabelgt
ltskosaltLabelgt
httpservicesld4lorgld4l_servicesloc_name_batchjspquery=ezra20cornellampmaxRecords=10
One full setup per authority
UI-QA-Authority
QA ndash normalize RDF returned from an authority
httplocalhost3000qasearchlinked_data
oclc_fastpersonal_nameq=twainampmaximumRecords=2
[urihttpidworldcatorgfast31622id31622
labelTwain Mark 1835-1910
urihttpidworldcatorgfast365563id365563
labelTwain Shania
httpexperimentalworldcatorgfastsearchquery=o
clcpersonalName+22twain22
ampsortKeys=usageampmaximumRecords=2
RDF of
search
results
Active-Triples
LDF Cache
(Marmotta or
Blazegraph) LDF Cache Jena-Fuseki-
Lucene
Cache
Direct Access
of External
Authority
HyraxVitrolib ndash UI for selecting an entry from an authority
search of cache performed via Lucene index
Third Set of Challenges
8 Disambiguation through better context eg expand from just prefLabel tohellip
preLabel altLabel birthdeath dates occupation etc
9 Reconciliation across multiple sources eg match LoC URI to OCLC FAST URI
53
Whatrsquos next
Addressing Architectural Challenges
bull Generalize process for accessing context on the
cache server and in the normalization layer
bull Multi-authority search and reconciliation
bull Address the need for cache refresh
bull Mirrored cache servers
User Experience and Design
bull User-centered Design
bull Observe listen learn design evaluate iterate
bull Iteratively design and evaluate UI for lookupauthorities
with catalogers
bull Search result rankingorderingfiltering for catalogers
bull Additional UI platforms eg FOLIO
56
Questions
httptinyurlcomld4l-auth-access
Appendix for Challenges 1-4
Challenge 1 Documentation
58
LoC httpidlocgovtechcenter
C Harlow notes on reconciling LoC - httpsgithubcomcmh2166lc-reconcile
OCLC FAST
httpswwwoclcorgdeveloperdevelopweb-servicesfast-apilinked-dataenhtml
GeoNames
httpwwwgeonamesorgexportgeonames-searchhtml
AGROVOC httpaimsfaoorgvest-registryvocabulariesagrovoc-multilingual-agricultural-thesaurus
swagger config httpsgithubcomNatLibFiSkosmosblobmasterswaggerjson
NALT
httpsagclassnalusdagov
DBpedia httpwikidbpediaorgOnlineAccess1220Public20Faceted20Web20Service20Inter
face
Challenge 2 Linked Data Access API
59
for Search Query for Term Fetch
LoC not supported URI
OCLC FAST httpexperimentalworldcatorgfastsearchq
uery=subauth+all+22query22ampsortK
eys=usageampmaximumRecords=maximumR
ecords
URI
GeoNames httpapigeonamesorgsearchq=queryamp
maxRows=maxRowsampusername=userna
meamptype=rdf
URI
AGROVOC httpartemideartuniroma2it8081agrovocr
estv1searchquery=queryamplang=lang
httpartemideartuniroma2it8081agrovo
crestv1datauri=httpaimsfaoorgaosa
grovocterm_id
NALT httpskosmoslibrarycornelledurestv1nalt
searchquery=queryamplang=lang
httpskosmoslibrarycornelledurestv1na
ltdatauri=term_uri
DBpedia
Challenge 3 Varying Results Formats
60
for Search Query for Term Fetch
LoC not supported rdf-xml
OCLC FAST rdf-xml rdf-xml
GeoNames rdf-xml rdf-xml
AGROVOC json-ld rdf-xml json-ld turtle
NALT json-ld rdf-xml json-ld turtle
DBpedia
Challenge 4 Varying Ontologies
61
Primary Ontology Flat vs Navigation
required
LoC madsrdf
SKOS
navigation required
OCLC FAST schemaorg
SKOS
flat
GeoNames geonames flat
hierarchical
AGROVOC SKOS flat
hierarchical
NALT SKOS flat
hierarchical
DBpedia dbpedia flat
Configurations for Questioning Authority
62
LoC httpsgithubcomld4l-
labslinked_data_authoritiestreemasterqa_locconfigauthoritieslinked_dat
a
OCLC FAST httpsgithubcomld4l-
labslinked_data_authoritiestreemasterqa_oclcfastconfigauthoritieslinked
_data
GeoNames httpsgithubcomld4l-
labslinked_data_authoritiestreemasterqa_geonamesconfigauthoritieslink
ed_data
AGROVOC httpsgithubcomld4l-
labslinked_data_authoritiestreemasterqa_agrovocconfigauthoritieslinked
_data
NALT httpsgithubcomld4l-
labslinked_data_authoritiestreemasterqa_naltconfigauthoritieslinked_dat
a
DBpedia httpsgithubcomld4l-
labslinked_data_authoritiestreemasterqa_dbpediaconfigauthoritieslinked
_data
Appendix for Challenges 5-7
Creating a Cache Server
Hardware
bull 8-core 64gb 3Ghz Mac Pro (late 2013) macOS Sierra
(10126)
bull 32tb Pegasus-2 Thunderbolt RAID configured as RAID-5
Triplestore
bull Apache Jena Fuseki 240 provides SPARQL endpoint
bull Apache Tomcat 90 runs custom web application(s)
bull Apache Lucene 36 provides search interface
64
Customizations
bull custom per-data-source JSP web application provides
searchbrowsedownload functionality
bull custom (generic) SPARQL Tag Library provides API for web
apps (available at httpsgithubcomeichmannlod-utilities)
bull custom (generic) Lucene Tag Library provides API for web apps
65
Loading a New Vocabulary
bull download RDF
bull if necessary convert to n-triples (required for GeoNames data for instance)
bull use tdbloader2 to populated triplestore
bull configure Fuseki server(s) with triplestore details
bull create new JSP project in Eclipse
bull write one or more indexer programs that populate Lucene indices and run indexer(s)
bull write searchbrowsedownload application logic using the SPARQL and Lucene tags
bull package project as war
bull deploy to Apache Tomcat server(s)
bull add new service to Apache HTTPD virtual host specification
66
UI Access to Cache Server
httpservicesld4lorgld4l_servicesloc_namejsp
Downloads
68
LoC httpidlocgovdownload (n-triples OR rdf-xml)
OCLC FAST httpwwwoclcorgresearchthemesdata-sciencefastdownloadhtml (n-triples)
GeoNames httpwwwgeonamesorgontologydocumentationhtml (custom format ndash see notes for processing)
AGROVOC httpsaims-faoatlassiannetwikispacesAGVpages2949126Releases (n-triples OR rdf-xml)
NALT httpsagclassnalusdagovdownloadshtml (rdf-xml)
DBpedia httpwikidbpediaorgdownloads-2016-04
Potential Options for Reconciliation
bull VIAF for name reconciliation ndash we are doing some
work with this
bull Wikidata ndash Ive heard that they are working on
Reconciliation issues but havent yet explored in
depth bull Intro Video (3hrs)
bull API Access
bull SPARQL ndash user manual
bull federated queries with other authorities
Doing a google search for linked data reconciliation
returns a large number of articles and presentations
on this concept
Links to Code amp More
bull QA Server - Code for a small app that provides the
Questioning Authority normalization layer
bull Linked Data Authorities - Configurations that can
be used with QA Server
bull LD4L Services - UI access to Cache Server
bull VitroLib - Code for the VitroLib cataloging tool
Overview
bull Background and Motivation
bull Examples
bull VitroLib
bull Hyrax
bull Architecture overview
bull Future work
bull Questions
Background
bull Mellon Foundation-funded LD4 Projects
bull Transition library systems to linked data
bull Link better explore better
bull Flat record -gt Discrete entities with well-defined relationships
bull String identifiers -gt URIs
bull Relationships with other linked data
Background
4
Made in
America
1980
Made in
America
Blues
Brothers
Made in
America 1980 Blues
Brothers
Blues
Brothers
MARC
RECORD
NAME
AUTH
FILE
WORK
INSTANCE
AGENT
RWO
BIBFRAME
BIBLIOTEK-O
ENTITIES
WITH URIS
Background
ldquoA cataloger is an individual responsible for the processes of description subject analysis classification and authority control of library materials Catalogers serve as the lsquofoundation of all library service as they are the ones who organize information in such a way as to make it easily accessiblersquordquo (Emphasis mine)
From httpsenwikipediaorgwikiCataloging
Background
bull Traditional practices Authority File
bull Eg Name Authority Files Subject Headings Genre Forms from LOC
bull String as unique identifier eg ldquoMark Twainrdquo
bull Tasks and workflows
bull Identification ldquoAboutnessrdquo
bull Disambiguation
bull Context and original authority record
Background
bull Goals Design and architecture around accessing authorities
bull VitroLib
bull Prototype cataloging editor
bull Createsuses linked data
bull Enables lookup and use of authorities
bull Hyrax
bull Samvera technology stack
bull Incorporate authorities into institutional repository records
8
VitroLib Demo
9
What just happened
Questioning Authority
MAGIC (To Be Explained)
VitroLib Search Service
LOC Genre Forms
Search LOC Genre Form
data
Query = animation
Translate to QA Service
Request
urihttpidlocgovgf2011026141
label ldquoClay animation television programsrdquo
context
ldquoAlternate Labelrdquo [
ldquoClaymation television programsrdquo
ldquoSculptmation television programsrdquo
] hellip
urihttpidlocgovgf2011026141
label ldquoClay animation television programsrdquo
altLabelList [
ldquoClaymation television programsrdquo
ldquoSculptmation television programsrdquo
] hellip
23
Hyrax Demo
Autocomplete Saving String and URI
Authority OCLC FAST Subauthority PersonName
Selected String and URI
Saves both string and URI
Selecting a Term using
Lookup with Context
26
Selecting a Term using
Lookup with Context
27
Getting more from the same authority
Getting more from other authorities
30
Architecture
Technical Motivation
bull Linked data provideshellip
bull URIs that identify specific terms (as opposed to ambiguity of using
strings)
bull Reconciliation to relate terms that are defined in separate authorities
bull Goals of implementationhellip
bull Provide a single process to access many authorities
bull Provide efficient and reliable access to authorities
bull Provide a means for disambiguation that empowers library staff to
make the most accurate selections
First Set of Challenges
1 Finding Documentation
2 Linked Data Access API eg no support partial support requires login credentials
sparql query endpoint only
3 Varying Results Formats eg rdf-xml json-ld turtle n-triples etc
4 Varying Ontologies eg SKOS schemaorg madsrdf dbpedia geonames
Multi-Server Architecture
QA ndash normalize RDF returned from an authority
Multi-Server Architecture
QA ndash normalize RDF returned from an authority
Direct Access
of External
Authority
HyraxVitrolib ndash UI for selecting an entry from an authority
Multi-Server Architecture
QA ndash normalize RDF returned from an authority
httplocalhost3000qasearchlinked_data
oclc_fastpersonal_nameq=twainamp
maximumRecords=2
Direct Access
of External
Authority
HyraxVitrolib ndash UI for selecting an entry from an authority
Multi-Server Architecture
QA ndash normalize RDF returned from an authority
httplocalhost3000qasearchlinked_data
oclc_fastpersonal_nameq=twainamp
maximumRecords=2
httpexperimentalworldcatorgfast
searchquery=oclcpersonalName+22twain22
ampsortKeys=usageampmaximumRecords=2
Direct Access
of External
Authority
HyraxVitrolib ndash UI for selecting an entry from an authority
Multi-Server Architecture
QA ndash normalize RDF returned from an authority
httplocalhost3000qasearchlinked_data
oclc_fastpersonal_nameq=twainamp
maximumRecords=2
httpexperimentalworldcatorgfast
searchquery=oclcpersonalName+22twain22
ampsortKeys=usageampmaximumRecords=2
lthttpidworldcatorgfast31622gt
a schemaPerson
dctermsidentifier 31622
skosprefLabel Twain Mark 1835-1910
skosaltLabel Make Teviin 1835-1910
Make Tuwen 1835-1910
lthttpidworldcatorgfast365563gt
a schemaPerson
dctermsidentifier 365563
skosprefLabel Twain Shania
skosaltLabel Twain Eilleen
Edwards Eilleen
Direct Access
of External
Authority
HyraxVitrolib ndash UI for selecting an entry from an authority
Multi-Server Architecture
QA ndash normalize RDF returned from an authority
httplocalhost3000qasearchlinked_data
oclc_fastpersonal_nameq=twainamp
maximumRecords=2
[urihttpidworldcatorgfast31622
id31622 labelTwain Mark 1835-1910
urihttpidworldcatorgfast365563
id365563labelTwain Shania ]
httpexperimentalworldcatorgfast
searchquery=oclcpersonalName+22twain22
ampsortKeys=usageampmaximumRecords=2
lthttpidworldcatorgfast31622gt
a schemaPerson
dctermsidentifier 31622
skosprefLabel Twain Mark 1835-1910
skosaltLabel Make Teviin 1835-1910
Make Tuwen 1835-1910
lthttpidworldcatorgfast365563gt
a schemaPerson
dctermsidentifier 365563
skosprefLabel Twain Shania
skosaltLabel Twain Eilleen
Edwards Eilleen
Direct Access
of External
Authority
HyraxVitrolib ndash UI for selecting an entry from an authority
Direct Access Query API
Direct against authorityhellip
httpexperimentalworldcatorgfastsearch
query=oclcpersonalName+22twain22
ampmaximumRecords=2
httpapigeonamesorgsearch
q=ithaca
ampmaxRows=2
ampusername=demo
amptype=rdf
httpartemideartuniroma2it8081agrovocrestv1search
query=milk
amplang=en
ampmaxhits=2
Normalized Query API
Through QA normalization layerhellip
httplocalhost3000qasearchlinked_dataoclc_fast
q=twain
ampmaxRecords=2
httplocalhost3000qasearchlinked_datageonames
q=ithaca
ampmaxRecords=2
httplocalhost3000qasearchlinked_dataagrovoc
q=milk
ampmaxRecords=2
amplang=en
Normalized Results
[urihttpidworldcatorgfast31622 id31622 labelTwain Mark 1835-1910 urihttpidworldcatorgfast365563 id365563 labelTwain Shania]
[uri httpswsgeonamesorg2162552 id httpswsgeonamesorg2162552 label Ithaca (AU) uri httpswsgeonamesorg4515289 id httpswsgeonamesorg4515289 label Ithaca (US)]
[uri httpaimsfaoorgaosagrovocc_8602 id httpaimsfaoorgaosagrovocc_8602 label acidophilus milk uri httpaimsfaoorgaosagrovocc_16076 id httpaimsfaoorgaosagrovocc_16076 label buffalo milk]
OCLC FAST GeoNames AgroVoc
Second Set of Challenges
5 Reliability amp Efficiency eg server uptime server load
6 Accuracy eg select results based on usage data lexical match
custom weighting other
7 Order Ranking eg How to order a graph
Cache Server Query Process
JSP Query API
Jena-Fuseki
Triplestore
One full setup per authority
LuceneSOLR
Index
Cache Server Query Process
JSP Query API
httpservicesld4lorgld4l_servicesloc_name_batchjspquery=ezra20cornellampmaxRecords=10
Jena-Fuseki
Triplestore
LuceneSOLR
Index
One full setup per authority
Cache Server Query Process
JSP Query API
lucene search for ezra cornell
index built with predicate values
ltskosprefLabelgt
ltskosaltLabelgt
Jena-Fuseki
Triplestore
LuceneSOLR
Index
httpservicesld4lorgld4l_servicesloc_name_batchjspquery=ezra20cornellampmaxRecords=10
One full setup per authority
Cache Server Query Process
JSP Query API
extract search rank
extract URI
Jena-Fuseki
Triplestore
for each result
LuceneSOLR
Index
lucene search for ezra cornell
index built with predicate values
ltskosprefLabelgt
ltskosaltLabelgt
httpservicesld4lorgld4l_servicesloc_name_batchjspquery=ezra20cornellampmaxRecords=10
One full setup per authority
Cache Server Query Process
JSP Query API
sparql query for URI
Jena-Fuseki
Triplestore
LuceneSOLR
Index
extract search rank
extract URI
for each result
lucene search for ezra cornell
index built with predicate values
ltskosprefLabelgt
ltskosaltLabelgt
httpservicesld4lorgld4l_servicesloc_name_batchjspquery=ezra20cornellampmaxRecords=10
One full setup per authority
Cache Server Query Process
JSP Query API
combine all results
Jena-Fuseki
Triplestore
insert search rank in predicate
lthttpvivoweborgontology
corerankgt
LuceneSOLR
Index
sparql query for URI extract search rank
extract URI
for each result
lucene search for ezra cornell
index built with predicate values
ltskosprefLabelgt
ltskosaltLabelgt
httpservicesld4lorgld4l_servicesloc_name_batchjspquery=ezra20cornellampmaxRecords=10
One full setup per authority
UI-QA-Authority
QA ndash normalize RDF returned from an authority
httplocalhost3000qasearchlinked_data
oclc_fastpersonal_nameq=twainampmaximumRecords=2
[urihttpidworldcatorgfast31622id31622
labelTwain Mark 1835-1910
urihttpidworldcatorgfast365563id365563
labelTwain Shania
httpexperimentalworldcatorgfastsearchquery=o
clcpersonalName+22twain22
ampsortKeys=usageampmaximumRecords=2
RDF of
search
results
Active-Triples
LDF Cache
(Marmotta or
Blazegraph) LDF Cache Jena-Fuseki-
Lucene
Cache
Direct Access
of External
Authority
HyraxVitrolib ndash UI for selecting an entry from an authority
search of cache performed via Lucene index
Third Set of Challenges
8 Disambiguation through better context eg expand from just prefLabel tohellip
preLabel altLabel birthdeath dates occupation etc
9 Reconciliation across multiple sources eg match LoC URI to OCLC FAST URI
53
Whatrsquos next
Addressing Architectural Challenges
bull Generalize process for accessing context on the
cache server and in the normalization layer
bull Multi-authority search and reconciliation
bull Address the need for cache refresh
bull Mirrored cache servers
User Experience and Design
bull User-centered Design
bull Observe listen learn design evaluate iterate
bull Iteratively design and evaluate UI for lookupauthorities
with catalogers
bull Search result rankingorderingfiltering for catalogers
bull Additional UI platforms eg FOLIO
56
Questions
httptinyurlcomld4l-auth-access
Appendix for Challenges 1-4
Challenge 1 Documentation
58
LoC httpidlocgovtechcenter
C Harlow notes on reconciling LoC - httpsgithubcomcmh2166lc-reconcile
OCLC FAST
httpswwwoclcorgdeveloperdevelopweb-servicesfast-apilinked-dataenhtml
GeoNames
httpwwwgeonamesorgexportgeonames-searchhtml
AGROVOC httpaimsfaoorgvest-registryvocabulariesagrovoc-multilingual-agricultural-thesaurus
swagger config httpsgithubcomNatLibFiSkosmosblobmasterswaggerjson
NALT
httpsagclassnalusdagov
DBpedia httpwikidbpediaorgOnlineAccess1220Public20Faceted20Web20Service20Inter
face
Challenge 2 Linked Data Access API
59
for Search Query for Term Fetch
LoC not supported URI
OCLC FAST httpexperimentalworldcatorgfastsearchq
uery=subauth+all+22query22ampsortK
eys=usageampmaximumRecords=maximumR
ecords
URI
GeoNames httpapigeonamesorgsearchq=queryamp
maxRows=maxRowsampusername=userna
meamptype=rdf
URI
AGROVOC httpartemideartuniroma2it8081agrovocr
estv1searchquery=queryamplang=lang
httpartemideartuniroma2it8081agrovo
crestv1datauri=httpaimsfaoorgaosa
grovocterm_id
NALT httpskosmoslibrarycornelledurestv1nalt
searchquery=queryamplang=lang
httpskosmoslibrarycornelledurestv1na
ltdatauri=term_uri
DBpedia
Challenge 3 Varying Results Formats
60
for Search Query for Term Fetch
LoC not supported rdf-xml
OCLC FAST rdf-xml rdf-xml
GeoNames rdf-xml rdf-xml
AGROVOC json-ld rdf-xml json-ld turtle
NALT json-ld rdf-xml json-ld turtle
DBpedia
Challenge 4 Varying Ontologies
61
Primary Ontology Flat vs Navigation
required
LoC madsrdf
SKOS
navigation required
OCLC FAST schemaorg
SKOS
flat
GeoNames geonames flat
hierarchical
AGROVOC SKOS flat
hierarchical
NALT SKOS flat
hierarchical
DBpedia dbpedia flat
Configurations for Questioning Authority
62
LoC httpsgithubcomld4l-
labslinked_data_authoritiestreemasterqa_locconfigauthoritieslinked_dat
a
OCLC FAST httpsgithubcomld4l-
labslinked_data_authoritiestreemasterqa_oclcfastconfigauthoritieslinked
_data
GeoNames httpsgithubcomld4l-
labslinked_data_authoritiestreemasterqa_geonamesconfigauthoritieslink
ed_data
AGROVOC httpsgithubcomld4l-
labslinked_data_authoritiestreemasterqa_agrovocconfigauthoritieslinked
_data
NALT httpsgithubcomld4l-
labslinked_data_authoritiestreemasterqa_naltconfigauthoritieslinked_dat
a
DBpedia httpsgithubcomld4l-
labslinked_data_authoritiestreemasterqa_dbpediaconfigauthoritieslinked
_data
Appendix for Challenges 5-7
Creating a Cache Server
Hardware
bull 8-core 64gb 3Ghz Mac Pro (late 2013) macOS Sierra
(10126)
bull 32tb Pegasus-2 Thunderbolt RAID configured as RAID-5
Triplestore
bull Apache Jena Fuseki 240 provides SPARQL endpoint
bull Apache Tomcat 90 runs custom web application(s)
bull Apache Lucene 36 provides search interface
64
Customizations
bull custom per-data-source JSP web application provides
searchbrowsedownload functionality
bull custom (generic) SPARQL Tag Library provides API for web
apps (available at httpsgithubcomeichmannlod-utilities)
bull custom (generic) Lucene Tag Library provides API for web apps
65
Loading a New Vocabulary
bull download RDF
bull if necessary convert to n-triples (required for GeoNames data for instance)
bull use tdbloader2 to populated triplestore
bull configure Fuseki server(s) with triplestore details
bull create new JSP project in Eclipse
bull write one or more indexer programs that populate Lucene indices and run indexer(s)
bull write searchbrowsedownload application logic using the SPARQL and Lucene tags
bull package project as war
bull deploy to Apache Tomcat server(s)
bull add new service to Apache HTTPD virtual host specification
66
UI Access to Cache Server
httpservicesld4lorgld4l_servicesloc_namejsp
Downloads
68
LoC httpidlocgovdownload (n-triples OR rdf-xml)
OCLC FAST httpwwwoclcorgresearchthemesdata-sciencefastdownloadhtml (n-triples)
GeoNames httpwwwgeonamesorgontologydocumentationhtml (custom format ndash see notes for processing)
AGROVOC httpsaims-faoatlassiannetwikispacesAGVpages2949126Releases (n-triples OR rdf-xml)
NALT httpsagclassnalusdagovdownloadshtml (rdf-xml)
DBpedia httpwikidbpediaorgdownloads-2016-04
Potential Options for Reconciliation
bull VIAF for name reconciliation ndash we are doing some
work with this
bull Wikidata ndash Ive heard that they are working on
Reconciliation issues but havent yet explored in
depth bull Intro Video (3hrs)
bull API Access
bull SPARQL ndash user manual
bull federated queries with other authorities
Doing a google search for linked data reconciliation
returns a large number of articles and presentations
on this concept
Links to Code amp More
bull QA Server - Code for a small app that provides the
Questioning Authority normalization layer
bull Linked Data Authorities - Configurations that can
be used with QA Server
bull LD4L Services - UI access to Cache Server
bull VitroLib - Code for the VitroLib cataloging tool
Background
bull Mellon Foundation-funded LD4 Projects
bull Transition library systems to linked data
bull Link better explore better
bull Flat record -gt Discrete entities with well-defined relationships
bull String identifiers -gt URIs
bull Relationships with other linked data
Background
4
Made in
America
1980
Made in
America
Blues
Brothers
Made in
America 1980 Blues
Brothers
Blues
Brothers
MARC
RECORD
NAME
AUTH
FILE
WORK
INSTANCE
AGENT
RWO
BIBFRAME
BIBLIOTEK-O
ENTITIES
WITH URIS
Background
ldquoA cataloger is an individual responsible for the processes of description subject analysis classification and authority control of library materials Catalogers serve as the lsquofoundation of all library service as they are the ones who organize information in such a way as to make it easily accessiblersquordquo (Emphasis mine)
From httpsenwikipediaorgwikiCataloging
Background
bull Traditional practices Authority File
bull Eg Name Authority Files Subject Headings Genre Forms from LOC
bull String as unique identifier eg ldquoMark Twainrdquo
bull Tasks and workflows
bull Identification ldquoAboutnessrdquo
bull Disambiguation
bull Context and original authority record
Background
bull Goals Design and architecture around accessing authorities
bull VitroLib
bull Prototype cataloging editor
bull Createsuses linked data
bull Enables lookup and use of authorities
bull Hyrax
bull Samvera technology stack
bull Incorporate authorities into institutional repository records
8
VitroLib Demo
9
What just happened
Questioning Authority
MAGIC (To Be Explained)
VitroLib Search Service
LOC Genre Forms
Search LOC Genre Form
data
Query = animation
Translate to QA Service
Request
urihttpidlocgovgf2011026141
label ldquoClay animation television programsrdquo
context
ldquoAlternate Labelrdquo [
ldquoClaymation television programsrdquo
ldquoSculptmation television programsrdquo
] hellip
urihttpidlocgovgf2011026141
label ldquoClay animation television programsrdquo
altLabelList [
ldquoClaymation television programsrdquo
ldquoSculptmation television programsrdquo
] hellip
23
Hyrax Demo
Autocomplete Saving String and URI
Authority OCLC FAST Subauthority PersonName
Selected String and URI
Saves both string and URI
Selecting a Term using
Lookup with Context
26
Selecting a Term using
Lookup with Context
27
Getting more from the same authority
Getting more from other authorities
30
Architecture
Technical Motivation
bull Linked data provideshellip
bull URIs that identify specific terms (as opposed to ambiguity of using
strings)
bull Reconciliation to relate terms that are defined in separate authorities
bull Goals of implementationhellip
bull Provide a single process to access many authorities
bull Provide efficient and reliable access to authorities
bull Provide a means for disambiguation that empowers library staff to
make the most accurate selections
First Set of Challenges
1 Finding Documentation
2 Linked Data Access API eg no support partial support requires login credentials
sparql query endpoint only
3 Varying Results Formats eg rdf-xml json-ld turtle n-triples etc
4 Varying Ontologies eg SKOS schemaorg madsrdf dbpedia geonames
Multi-Server Architecture
QA ndash normalize RDF returned from an authority
Multi-Server Architecture
QA ndash normalize RDF returned from an authority
Direct Access
of External
Authority
HyraxVitrolib ndash UI for selecting an entry from an authority
Multi-Server Architecture
QA ndash normalize RDF returned from an authority
httplocalhost3000qasearchlinked_data
oclc_fastpersonal_nameq=twainamp
maximumRecords=2
Direct Access
of External
Authority
HyraxVitrolib ndash UI for selecting an entry from an authority
Multi-Server Architecture
QA ndash normalize RDF returned from an authority
httplocalhost3000qasearchlinked_data
oclc_fastpersonal_nameq=twainamp
maximumRecords=2
httpexperimentalworldcatorgfast
searchquery=oclcpersonalName+22twain22
ampsortKeys=usageampmaximumRecords=2
Direct Access
of External
Authority
HyraxVitrolib ndash UI for selecting an entry from an authority
Multi-Server Architecture
QA ndash normalize RDF returned from an authority
httplocalhost3000qasearchlinked_data
oclc_fastpersonal_nameq=twainamp
maximumRecords=2
httpexperimentalworldcatorgfast
searchquery=oclcpersonalName+22twain22
ampsortKeys=usageampmaximumRecords=2
lthttpidworldcatorgfast31622gt
a schemaPerson
dctermsidentifier 31622
skosprefLabel Twain Mark 1835-1910
skosaltLabel Make Teviin 1835-1910
Make Tuwen 1835-1910
lthttpidworldcatorgfast365563gt
a schemaPerson
dctermsidentifier 365563
skosprefLabel Twain Shania
skosaltLabel Twain Eilleen
Edwards Eilleen
Direct Access
of External
Authority
HyraxVitrolib ndash UI for selecting an entry from an authority
Multi-Server Architecture
QA ndash normalize RDF returned from an authority
httplocalhost3000qasearchlinked_data
oclc_fastpersonal_nameq=twainamp
maximumRecords=2
[urihttpidworldcatorgfast31622
id31622 labelTwain Mark 1835-1910
urihttpidworldcatorgfast365563
id365563labelTwain Shania ]
httpexperimentalworldcatorgfast
searchquery=oclcpersonalName+22twain22
ampsortKeys=usageampmaximumRecords=2
lthttpidworldcatorgfast31622gt
a schemaPerson
dctermsidentifier 31622
skosprefLabel Twain Mark 1835-1910
skosaltLabel Make Teviin 1835-1910
Make Tuwen 1835-1910
lthttpidworldcatorgfast365563gt
a schemaPerson
dctermsidentifier 365563
skosprefLabel Twain Shania
skosaltLabel Twain Eilleen
Edwards Eilleen
Direct Access
of External
Authority
HyraxVitrolib ndash UI for selecting an entry from an authority
Direct Access Query API
Direct against authorityhellip
httpexperimentalworldcatorgfastsearch
query=oclcpersonalName+22twain22
ampmaximumRecords=2
httpapigeonamesorgsearch
q=ithaca
ampmaxRows=2
ampusername=demo
amptype=rdf
httpartemideartuniroma2it8081agrovocrestv1search
query=milk
amplang=en
ampmaxhits=2
Normalized Query API
Through QA normalization layerhellip
httplocalhost3000qasearchlinked_dataoclc_fast
q=twain
ampmaxRecords=2
httplocalhost3000qasearchlinked_datageonames
q=ithaca
ampmaxRecords=2
httplocalhost3000qasearchlinked_dataagrovoc
q=milk
ampmaxRecords=2
amplang=en
Normalized Results
[urihttpidworldcatorgfast31622 id31622 labelTwain Mark 1835-1910 urihttpidworldcatorgfast365563 id365563 labelTwain Shania]
[uri httpswsgeonamesorg2162552 id httpswsgeonamesorg2162552 label Ithaca (AU) uri httpswsgeonamesorg4515289 id httpswsgeonamesorg4515289 label Ithaca (US)]
[uri httpaimsfaoorgaosagrovocc_8602 id httpaimsfaoorgaosagrovocc_8602 label acidophilus milk uri httpaimsfaoorgaosagrovocc_16076 id httpaimsfaoorgaosagrovocc_16076 label buffalo milk]
OCLC FAST GeoNames AgroVoc
Second Set of Challenges
5 Reliability amp Efficiency eg server uptime server load
6 Accuracy eg select results based on usage data lexical match
custom weighting other
7 Order Ranking eg How to order a graph
Cache Server Query Process
JSP Query API
Jena-Fuseki
Triplestore
One full setup per authority
LuceneSOLR
Index
Cache Server Query Process
JSP Query API
httpservicesld4lorgld4l_servicesloc_name_batchjspquery=ezra20cornellampmaxRecords=10
Jena-Fuseki
Triplestore
LuceneSOLR
Index
One full setup per authority
Cache Server Query Process
JSP Query API
lucene search for ezra cornell
index built with predicate values
ltskosprefLabelgt
ltskosaltLabelgt
Jena-Fuseki
Triplestore
LuceneSOLR
Index
httpservicesld4lorgld4l_servicesloc_name_batchjspquery=ezra20cornellampmaxRecords=10
One full setup per authority
Cache Server Query Process
JSP Query API
extract search rank
extract URI
Jena-Fuseki
Triplestore
for each result
LuceneSOLR
Index
lucene search for ezra cornell
index built with predicate values
ltskosprefLabelgt
ltskosaltLabelgt
httpservicesld4lorgld4l_servicesloc_name_batchjspquery=ezra20cornellampmaxRecords=10
One full setup per authority
Cache Server Query Process
JSP Query API
sparql query for URI
Jena-Fuseki
Triplestore
LuceneSOLR
Index
extract search rank
extract URI
for each result
lucene search for ezra cornell
index built with predicate values
ltskosprefLabelgt
ltskosaltLabelgt
httpservicesld4lorgld4l_servicesloc_name_batchjspquery=ezra20cornellampmaxRecords=10
One full setup per authority
Cache Server Query Process
JSP Query API
combine all results
Jena-Fuseki
Triplestore
insert search rank in predicate
lthttpvivoweborgontology
corerankgt
LuceneSOLR
Index
sparql query for URI extract search rank
extract URI
for each result
lucene search for ezra cornell
index built with predicate values
ltskosprefLabelgt
ltskosaltLabelgt
httpservicesld4lorgld4l_servicesloc_name_batchjspquery=ezra20cornellampmaxRecords=10
One full setup per authority
UI-QA-Authority
QA ndash normalize RDF returned from an authority
httplocalhost3000qasearchlinked_data
oclc_fastpersonal_nameq=twainampmaximumRecords=2
[urihttpidworldcatorgfast31622id31622
labelTwain Mark 1835-1910
urihttpidworldcatorgfast365563id365563
labelTwain Shania
httpexperimentalworldcatorgfastsearchquery=o
clcpersonalName+22twain22
ampsortKeys=usageampmaximumRecords=2
RDF of
search
results
Active-Triples
LDF Cache
(Marmotta or
Blazegraph) LDF Cache Jena-Fuseki-
Lucene
Cache
Direct Access
of External
Authority
HyraxVitrolib ndash UI for selecting an entry from an authority
search of cache performed via Lucene index
Third Set of Challenges
8 Disambiguation through better context eg expand from just prefLabel tohellip
preLabel altLabel birthdeath dates occupation etc
9 Reconciliation across multiple sources eg match LoC URI to OCLC FAST URI
53
Whatrsquos next
Addressing Architectural Challenges
bull Generalize process for accessing context on the
cache server and in the normalization layer
bull Multi-authority search and reconciliation
bull Address the need for cache refresh
bull Mirrored cache servers
User Experience and Design
bull User-centered Design
bull Observe listen learn design evaluate iterate
bull Iteratively design and evaluate UI for lookupauthorities
with catalogers
bull Search result rankingorderingfiltering for catalogers
bull Additional UI platforms eg FOLIO
56
Questions
httptinyurlcomld4l-auth-access
Appendix for Challenges 1-4
Challenge 1 Documentation
58
LoC httpidlocgovtechcenter
C Harlow notes on reconciling LoC - httpsgithubcomcmh2166lc-reconcile
OCLC FAST
httpswwwoclcorgdeveloperdevelopweb-servicesfast-apilinked-dataenhtml
GeoNames
httpwwwgeonamesorgexportgeonames-searchhtml
AGROVOC httpaimsfaoorgvest-registryvocabulariesagrovoc-multilingual-agricultural-thesaurus
swagger config httpsgithubcomNatLibFiSkosmosblobmasterswaggerjson
NALT
httpsagclassnalusdagov
DBpedia httpwikidbpediaorgOnlineAccess1220Public20Faceted20Web20Service20Inter
face
Challenge 2 Linked Data Access API
59
for Search Query for Term Fetch
LoC not supported URI
OCLC FAST httpexperimentalworldcatorgfastsearchq
uery=subauth+all+22query22ampsortK
eys=usageampmaximumRecords=maximumR
ecords
URI
GeoNames httpapigeonamesorgsearchq=queryamp
maxRows=maxRowsampusername=userna
meamptype=rdf
URI
AGROVOC httpartemideartuniroma2it8081agrovocr
estv1searchquery=queryamplang=lang
httpartemideartuniroma2it8081agrovo
crestv1datauri=httpaimsfaoorgaosa
grovocterm_id
NALT httpskosmoslibrarycornelledurestv1nalt
searchquery=queryamplang=lang
httpskosmoslibrarycornelledurestv1na
ltdatauri=term_uri
DBpedia
Challenge 3 Varying Results Formats
60
for Search Query for Term Fetch
LoC not supported rdf-xml
OCLC FAST rdf-xml rdf-xml
GeoNames rdf-xml rdf-xml
AGROVOC json-ld rdf-xml json-ld turtle
NALT json-ld rdf-xml json-ld turtle
DBpedia
Challenge 4 Varying Ontologies
61
Primary Ontology Flat vs Navigation
required
LoC madsrdf
SKOS
navigation required
OCLC FAST schemaorg
SKOS
flat
GeoNames geonames flat
hierarchical
AGROVOC SKOS flat
hierarchical
NALT SKOS flat
hierarchical
DBpedia dbpedia flat
Configurations for Questioning Authority
62
LoC httpsgithubcomld4l-
labslinked_data_authoritiestreemasterqa_locconfigauthoritieslinked_dat
a
OCLC FAST httpsgithubcomld4l-
labslinked_data_authoritiestreemasterqa_oclcfastconfigauthoritieslinked
_data
GeoNames httpsgithubcomld4l-
labslinked_data_authoritiestreemasterqa_geonamesconfigauthoritieslink
ed_data
AGROVOC httpsgithubcomld4l-
labslinked_data_authoritiestreemasterqa_agrovocconfigauthoritieslinked
_data
NALT httpsgithubcomld4l-
labslinked_data_authoritiestreemasterqa_naltconfigauthoritieslinked_dat
a
DBpedia httpsgithubcomld4l-
labslinked_data_authoritiestreemasterqa_dbpediaconfigauthoritieslinked
_data
Appendix for Challenges 5-7
Creating a Cache Server
Hardware
bull 8-core 64gb 3Ghz Mac Pro (late 2013) macOS Sierra
(10126)
bull 32tb Pegasus-2 Thunderbolt RAID configured as RAID-5
Triplestore
bull Apache Jena Fuseki 240 provides SPARQL endpoint
bull Apache Tomcat 90 runs custom web application(s)
bull Apache Lucene 36 provides search interface
64
Customizations
bull custom per-data-source JSP web application provides
searchbrowsedownload functionality
bull custom (generic) SPARQL Tag Library provides API for web
apps (available at httpsgithubcomeichmannlod-utilities)
bull custom (generic) Lucene Tag Library provides API for web apps
65
Loading a New Vocabulary
bull download RDF
bull if necessary convert to n-triples (required for GeoNames data for instance)
bull use tdbloader2 to populated triplestore
bull configure Fuseki server(s) with triplestore details
bull create new JSP project in Eclipse
bull write one or more indexer programs that populate Lucene indices and run indexer(s)
bull write searchbrowsedownload application logic using the SPARQL and Lucene tags
bull package project as war
bull deploy to Apache Tomcat server(s)
bull add new service to Apache HTTPD virtual host specification
66
UI Access to Cache Server
httpservicesld4lorgld4l_servicesloc_namejsp
Downloads
68
LoC httpidlocgovdownload (n-triples OR rdf-xml)
OCLC FAST httpwwwoclcorgresearchthemesdata-sciencefastdownloadhtml (n-triples)
GeoNames httpwwwgeonamesorgontologydocumentationhtml (custom format ndash see notes for processing)
AGROVOC httpsaims-faoatlassiannetwikispacesAGVpages2949126Releases (n-triples OR rdf-xml)
NALT httpsagclassnalusdagovdownloadshtml (rdf-xml)
DBpedia httpwikidbpediaorgdownloads-2016-04
Potential Options for Reconciliation
bull VIAF for name reconciliation ndash we are doing some
work with this
bull Wikidata ndash Ive heard that they are working on
Reconciliation issues but havent yet explored in
depth bull Intro Video (3hrs)
bull API Access
bull SPARQL ndash user manual
bull federated queries with other authorities
Doing a google search for linked data reconciliation
returns a large number of articles and presentations
on this concept
Links to Code amp More
bull QA Server - Code for a small app that provides the
Questioning Authority normalization layer
bull Linked Data Authorities - Configurations that can
be used with QA Server
bull LD4L Services - UI access to Cache Server
bull VitroLib - Code for the VitroLib cataloging tool
Background
4
Made in
America
1980
Made in
America
Blues
Brothers
Made in
America 1980 Blues
Brothers
Blues
Brothers
MARC
RECORD
NAME
AUTH
FILE
WORK
INSTANCE
AGENT
RWO
BIBFRAME
BIBLIOTEK-O
ENTITIES
WITH URIS
Background
ldquoA cataloger is an individual responsible for the processes of description subject analysis classification and authority control of library materials Catalogers serve as the lsquofoundation of all library service as they are the ones who organize information in such a way as to make it easily accessiblersquordquo (Emphasis mine)
From httpsenwikipediaorgwikiCataloging
Background
bull Traditional practices Authority File
bull Eg Name Authority Files Subject Headings Genre Forms from LOC
bull String as unique identifier eg ldquoMark Twainrdquo
bull Tasks and workflows
bull Identification ldquoAboutnessrdquo
bull Disambiguation
bull Context and original authority record
Background
bull Goals Design and architecture around accessing authorities
bull VitroLib
bull Prototype cataloging editor
bull Createsuses linked data
bull Enables lookup and use of authorities
bull Hyrax
bull Samvera technology stack
bull Incorporate authorities into institutional repository records
8
VitroLib Demo
9
What just happened
Questioning Authority
MAGIC (To Be Explained)
VitroLib Search Service
LOC Genre Forms
Search LOC Genre Form
data
Query = animation
Translate to QA Service
Request
urihttpidlocgovgf2011026141
label ldquoClay animation television programsrdquo
context
ldquoAlternate Labelrdquo [
ldquoClaymation television programsrdquo
ldquoSculptmation television programsrdquo
] hellip
urihttpidlocgovgf2011026141
label ldquoClay animation television programsrdquo
altLabelList [
ldquoClaymation television programsrdquo
ldquoSculptmation television programsrdquo
] hellip
23
Hyrax Demo
Autocomplete Saving String and URI
Authority OCLC FAST Subauthority PersonName
Selected String and URI
Saves both string and URI
Selecting a Term using
Lookup with Context
26
Selecting a Term using
Lookup with Context
27
Getting more from the same authority
Getting more from other authorities
30
Architecture
Technical Motivation
bull Linked data provideshellip
bull URIs that identify specific terms (as opposed to ambiguity of using
strings)
bull Reconciliation to relate terms that are defined in separate authorities
bull Goals of implementationhellip
bull Provide a single process to access many authorities
bull Provide efficient and reliable access to authorities
bull Provide a means for disambiguation that empowers library staff to
make the most accurate selections
First Set of Challenges
1 Finding Documentation
2 Linked Data Access API eg no support partial support requires login credentials
sparql query endpoint only
3 Varying Results Formats eg rdf-xml json-ld turtle n-triples etc
4 Varying Ontologies eg SKOS schemaorg madsrdf dbpedia geonames
Multi-Server Architecture
QA ndash normalize RDF returned from an authority
Multi-Server Architecture
QA ndash normalize RDF returned from an authority
Direct Access
of External
Authority
HyraxVitrolib ndash UI for selecting an entry from an authority
Multi-Server Architecture
QA ndash normalize RDF returned from an authority
httplocalhost3000qasearchlinked_data
oclc_fastpersonal_nameq=twainamp
maximumRecords=2
Direct Access
of External
Authority
HyraxVitrolib ndash UI for selecting an entry from an authority
Multi-Server Architecture
QA ndash normalize RDF returned from an authority
httplocalhost3000qasearchlinked_data
oclc_fastpersonal_nameq=twainamp
maximumRecords=2
httpexperimentalworldcatorgfast
searchquery=oclcpersonalName+22twain22
ampsortKeys=usageampmaximumRecords=2
Direct Access
of External
Authority
HyraxVitrolib ndash UI for selecting an entry from an authority
Multi-Server Architecture
QA ndash normalize RDF returned from an authority
httplocalhost3000qasearchlinked_data
oclc_fastpersonal_nameq=twainamp
maximumRecords=2
httpexperimentalworldcatorgfast
searchquery=oclcpersonalName+22twain22
ampsortKeys=usageampmaximumRecords=2
lthttpidworldcatorgfast31622gt
a schemaPerson
dctermsidentifier 31622
skosprefLabel Twain Mark 1835-1910
skosaltLabel Make Teviin 1835-1910
Make Tuwen 1835-1910
lthttpidworldcatorgfast365563gt
a schemaPerson
dctermsidentifier 365563
skosprefLabel Twain Shania
skosaltLabel Twain Eilleen
Edwards Eilleen
Direct Access
of External
Authority
HyraxVitrolib ndash UI for selecting an entry from an authority
Multi-Server Architecture
QA ndash normalize RDF returned from an authority
httplocalhost3000qasearchlinked_data
oclc_fastpersonal_nameq=twainamp
maximumRecords=2
[urihttpidworldcatorgfast31622
id31622 labelTwain Mark 1835-1910
urihttpidworldcatorgfast365563
id365563labelTwain Shania ]
httpexperimentalworldcatorgfast
searchquery=oclcpersonalName+22twain22
ampsortKeys=usageampmaximumRecords=2
lthttpidworldcatorgfast31622gt
a schemaPerson
dctermsidentifier 31622
skosprefLabel Twain Mark 1835-1910
skosaltLabel Make Teviin 1835-1910
Make Tuwen 1835-1910
lthttpidworldcatorgfast365563gt
a schemaPerson
dctermsidentifier 365563
skosprefLabel Twain Shania
skosaltLabel Twain Eilleen
Edwards Eilleen
Direct Access
of External
Authority
HyraxVitrolib ndash UI for selecting an entry from an authority
Direct Access Query API
Direct against authorityhellip
httpexperimentalworldcatorgfastsearch
query=oclcpersonalName+22twain22
ampmaximumRecords=2
httpapigeonamesorgsearch
q=ithaca
ampmaxRows=2
ampusername=demo
amptype=rdf
httpartemideartuniroma2it8081agrovocrestv1search
query=milk
amplang=en
ampmaxhits=2
Normalized Query API
Through QA normalization layerhellip
httplocalhost3000qasearchlinked_dataoclc_fast
q=twain
ampmaxRecords=2
httplocalhost3000qasearchlinked_datageonames
q=ithaca
ampmaxRecords=2
httplocalhost3000qasearchlinked_dataagrovoc
q=milk
ampmaxRecords=2
amplang=en
Normalized Results
[urihttpidworldcatorgfast31622 id31622 labelTwain Mark 1835-1910 urihttpidworldcatorgfast365563 id365563 labelTwain Shania]
[uri httpswsgeonamesorg2162552 id httpswsgeonamesorg2162552 label Ithaca (AU) uri httpswsgeonamesorg4515289 id httpswsgeonamesorg4515289 label Ithaca (US)]
[uri httpaimsfaoorgaosagrovocc_8602 id httpaimsfaoorgaosagrovocc_8602 label acidophilus milk uri httpaimsfaoorgaosagrovocc_16076 id httpaimsfaoorgaosagrovocc_16076 label buffalo milk]
OCLC FAST GeoNames AgroVoc
Second Set of Challenges
5 Reliability amp Efficiency eg server uptime server load
6 Accuracy eg select results based on usage data lexical match
custom weighting other
7 Order Ranking eg How to order a graph
Cache Server Query Process
JSP Query API
Jena-Fuseki
Triplestore
One full setup per authority
LuceneSOLR
Index
Cache Server Query Process
JSP Query API
httpservicesld4lorgld4l_servicesloc_name_batchjspquery=ezra20cornellampmaxRecords=10
Jena-Fuseki
Triplestore
LuceneSOLR
Index
One full setup per authority
Cache Server Query Process
JSP Query API
lucene search for ezra cornell
index built with predicate values
ltskosprefLabelgt
ltskosaltLabelgt
Jena-Fuseki
Triplestore
LuceneSOLR
Index
httpservicesld4lorgld4l_servicesloc_name_batchjspquery=ezra20cornellampmaxRecords=10
One full setup per authority
Cache Server Query Process
JSP Query API
extract search rank
extract URI
Jena-Fuseki
Triplestore
for each result
LuceneSOLR
Index
lucene search for ezra cornell
index built with predicate values
ltskosprefLabelgt
ltskosaltLabelgt
httpservicesld4lorgld4l_servicesloc_name_batchjspquery=ezra20cornellampmaxRecords=10
One full setup per authority
Cache Server Query Process
JSP Query API
sparql query for URI
Jena-Fuseki
Triplestore
LuceneSOLR
Index
extract search rank
extract URI
for each result
lucene search for ezra cornell
index built with predicate values
ltskosprefLabelgt
ltskosaltLabelgt
httpservicesld4lorgld4l_servicesloc_name_batchjspquery=ezra20cornellampmaxRecords=10
One full setup per authority
Cache Server Query Process
JSP Query API
combine all results
Jena-Fuseki
Triplestore
insert search rank in predicate
lthttpvivoweborgontology
corerankgt
LuceneSOLR
Index
sparql query for URI extract search rank
extract URI
for each result
lucene search for ezra cornell
index built with predicate values
ltskosprefLabelgt
ltskosaltLabelgt
httpservicesld4lorgld4l_servicesloc_name_batchjspquery=ezra20cornellampmaxRecords=10
One full setup per authority
UI-QA-Authority
QA ndash normalize RDF returned from an authority
httplocalhost3000qasearchlinked_data
oclc_fastpersonal_nameq=twainampmaximumRecords=2
[urihttpidworldcatorgfast31622id31622
labelTwain Mark 1835-1910
urihttpidworldcatorgfast365563id365563
labelTwain Shania
httpexperimentalworldcatorgfastsearchquery=o
clcpersonalName+22twain22
ampsortKeys=usageampmaximumRecords=2
RDF of
search
results
Active-Triples
LDF Cache
(Marmotta or
Blazegraph) LDF Cache Jena-Fuseki-
Lucene
Cache
Direct Access
of External
Authority
HyraxVitrolib ndash UI for selecting an entry from an authority
search of cache performed via Lucene index
Third Set of Challenges
8 Disambiguation through better context eg expand from just prefLabel tohellip
preLabel altLabel birthdeath dates occupation etc
9 Reconciliation across multiple sources eg match LoC URI to OCLC FAST URI
53
Whatrsquos next
Addressing Architectural Challenges
bull Generalize process for accessing context on the
cache server and in the normalization layer
bull Multi-authority search and reconciliation
bull Address the need for cache refresh
bull Mirrored cache servers
User Experience and Design
bull User-centered Design
bull Observe listen learn design evaluate iterate
bull Iteratively design and evaluate UI for lookupauthorities
with catalogers
bull Search result rankingorderingfiltering for catalogers
bull Additional UI platforms eg FOLIO
56
Questions
httptinyurlcomld4l-auth-access
Appendix for Challenges 1-4
Challenge 1 Documentation
58
LoC httpidlocgovtechcenter
C Harlow notes on reconciling LoC - httpsgithubcomcmh2166lc-reconcile
OCLC FAST
httpswwwoclcorgdeveloperdevelopweb-servicesfast-apilinked-dataenhtml
GeoNames
httpwwwgeonamesorgexportgeonames-searchhtml
AGROVOC httpaimsfaoorgvest-registryvocabulariesagrovoc-multilingual-agricultural-thesaurus
swagger config httpsgithubcomNatLibFiSkosmosblobmasterswaggerjson
NALT
httpsagclassnalusdagov
DBpedia httpwikidbpediaorgOnlineAccess1220Public20Faceted20Web20Service20Inter
face
Challenge 2 Linked Data Access API
59
for Search Query for Term Fetch
LoC not supported URI
OCLC FAST httpexperimentalworldcatorgfastsearchq
uery=subauth+all+22query22ampsortK
eys=usageampmaximumRecords=maximumR
ecords
URI
GeoNames httpapigeonamesorgsearchq=queryamp
maxRows=maxRowsampusername=userna
meamptype=rdf
URI
AGROVOC httpartemideartuniroma2it8081agrovocr
estv1searchquery=queryamplang=lang
httpartemideartuniroma2it8081agrovo
crestv1datauri=httpaimsfaoorgaosa
grovocterm_id
NALT httpskosmoslibrarycornelledurestv1nalt
searchquery=queryamplang=lang
httpskosmoslibrarycornelledurestv1na
ltdatauri=term_uri
DBpedia
Challenge 3 Varying Results Formats
60
for Search Query for Term Fetch
LoC not supported rdf-xml
OCLC FAST rdf-xml rdf-xml
GeoNames rdf-xml rdf-xml
AGROVOC json-ld rdf-xml json-ld turtle
NALT json-ld rdf-xml json-ld turtle
DBpedia
Challenge 4 Varying Ontologies
61
Primary Ontology Flat vs Navigation
required
LoC madsrdf
SKOS
navigation required
OCLC FAST schemaorg
SKOS
flat
GeoNames geonames flat
hierarchical
AGROVOC SKOS flat
hierarchical
NALT SKOS flat
hierarchical
DBpedia dbpedia flat
Configurations for Questioning Authority
62
LoC httpsgithubcomld4l-
labslinked_data_authoritiestreemasterqa_locconfigauthoritieslinked_dat
a
OCLC FAST httpsgithubcomld4l-
labslinked_data_authoritiestreemasterqa_oclcfastconfigauthoritieslinked
_data
GeoNames httpsgithubcomld4l-
labslinked_data_authoritiestreemasterqa_geonamesconfigauthoritieslink
ed_data
AGROVOC httpsgithubcomld4l-
labslinked_data_authoritiestreemasterqa_agrovocconfigauthoritieslinked
_data
NALT httpsgithubcomld4l-
labslinked_data_authoritiestreemasterqa_naltconfigauthoritieslinked_dat
a
DBpedia httpsgithubcomld4l-
labslinked_data_authoritiestreemasterqa_dbpediaconfigauthoritieslinked
_data
Appendix for Challenges 5-7
Creating a Cache Server
Hardware
bull 8-core 64gb 3Ghz Mac Pro (late 2013) macOS Sierra
(10126)
bull 32tb Pegasus-2 Thunderbolt RAID configured as RAID-5
Triplestore
bull Apache Jena Fuseki 240 provides SPARQL endpoint
bull Apache Tomcat 90 runs custom web application(s)
bull Apache Lucene 36 provides search interface
64
Customizations
bull custom per-data-source JSP web application provides
searchbrowsedownload functionality
bull custom (generic) SPARQL Tag Library provides API for web
apps (available at httpsgithubcomeichmannlod-utilities)
bull custom (generic) Lucene Tag Library provides API for web apps
65
Loading a New Vocabulary
bull download RDF
bull if necessary convert to n-triples (required for GeoNames data for instance)
bull use tdbloader2 to populated triplestore
bull configure Fuseki server(s) with triplestore details
bull create new JSP project in Eclipse
bull write one or more indexer programs that populate Lucene indices and run indexer(s)
bull write searchbrowsedownload application logic using the SPARQL and Lucene tags
bull package project as war
bull deploy to Apache Tomcat server(s)
bull add new service to Apache HTTPD virtual host specification
66
UI Access to Cache Server
httpservicesld4lorgld4l_servicesloc_namejsp
Downloads
68
LoC httpidlocgovdownload (n-triples OR rdf-xml)
OCLC FAST httpwwwoclcorgresearchthemesdata-sciencefastdownloadhtml (n-triples)
GeoNames httpwwwgeonamesorgontologydocumentationhtml (custom format ndash see notes for processing)
AGROVOC httpsaims-faoatlassiannetwikispacesAGVpages2949126Releases (n-triples OR rdf-xml)
NALT httpsagclassnalusdagovdownloadshtml (rdf-xml)
DBpedia httpwikidbpediaorgdownloads-2016-04
Potential Options for Reconciliation
bull VIAF for name reconciliation ndash we are doing some
work with this
bull Wikidata ndash Ive heard that they are working on
Reconciliation issues but havent yet explored in
depth bull Intro Video (3hrs)
bull API Access
bull SPARQL ndash user manual
bull federated queries with other authorities
Doing a google search for linked data reconciliation
returns a large number of articles and presentations
on this concept
Links to Code amp More
bull QA Server - Code for a small app that provides the
Questioning Authority normalization layer
bull Linked Data Authorities - Configurations that can
be used with QA Server
bull LD4L Services - UI access to Cache Server
bull VitroLib - Code for the VitroLib cataloging tool
Background
ldquoA cataloger is an individual responsible for the processes of description subject analysis classification and authority control of library materials Catalogers serve as the lsquofoundation of all library service as they are the ones who organize information in such a way as to make it easily accessiblersquordquo (Emphasis mine)
From httpsenwikipediaorgwikiCataloging
Background
bull Traditional practices Authority File
bull Eg Name Authority Files Subject Headings Genre Forms from LOC
bull String as unique identifier eg ldquoMark Twainrdquo
bull Tasks and workflows
bull Identification ldquoAboutnessrdquo
bull Disambiguation
bull Context and original authority record
Background
bull Goals Design and architecture around accessing authorities
bull VitroLib
bull Prototype cataloging editor
bull Createsuses linked data
bull Enables lookup and use of authorities
bull Hyrax
bull Samvera technology stack
bull Incorporate authorities into institutional repository records
8
VitroLib Demo
9
What just happened
Questioning Authority
MAGIC (To Be Explained)
VitroLib Search Service
LOC Genre Forms
Search LOC Genre Form
data
Query = animation
Translate to QA Service
Request
urihttpidlocgovgf2011026141
label ldquoClay animation television programsrdquo
context
ldquoAlternate Labelrdquo [
ldquoClaymation television programsrdquo
ldquoSculptmation television programsrdquo
] hellip
urihttpidlocgovgf2011026141
label ldquoClay animation television programsrdquo
altLabelList [
ldquoClaymation television programsrdquo
ldquoSculptmation television programsrdquo
] hellip
23
Hyrax Demo
Autocomplete Saving String and URI
Authority OCLC FAST Subauthority PersonName
Selected String and URI
Saves both string and URI
Selecting a Term using
Lookup with Context
26
Selecting a Term using
Lookup with Context
27
Getting more from the same authority
Getting more from other authorities
30
Architecture
Technical Motivation
bull Linked data provideshellip
bull URIs that identify specific terms (as opposed to ambiguity of using
strings)
bull Reconciliation to relate terms that are defined in separate authorities
bull Goals of implementationhellip
bull Provide a single process to access many authorities
bull Provide efficient and reliable access to authorities
bull Provide a means for disambiguation that empowers library staff to
make the most accurate selections
First Set of Challenges
1 Finding Documentation
2 Linked Data Access API eg no support partial support requires login credentials
sparql query endpoint only
3 Varying Results Formats eg rdf-xml json-ld turtle n-triples etc
4 Varying Ontologies eg SKOS schemaorg madsrdf dbpedia geonames
Multi-Server Architecture
QA ndash normalize RDF returned from an authority
Multi-Server Architecture
QA ndash normalize RDF returned from an authority
Direct Access
of External
Authority
HyraxVitrolib ndash UI for selecting an entry from an authority
Multi-Server Architecture
QA ndash normalize RDF returned from an authority
httplocalhost3000qasearchlinked_data
oclc_fastpersonal_nameq=twainamp
maximumRecords=2
Direct Access
of External
Authority
HyraxVitrolib ndash UI for selecting an entry from an authority
Multi-Server Architecture
QA ndash normalize RDF returned from an authority
httplocalhost3000qasearchlinked_data
oclc_fastpersonal_nameq=twainamp
maximumRecords=2
httpexperimentalworldcatorgfast
searchquery=oclcpersonalName+22twain22
ampsortKeys=usageampmaximumRecords=2
Direct Access
of External
Authority
HyraxVitrolib ndash UI for selecting an entry from an authority
Multi-Server Architecture
QA ndash normalize RDF returned from an authority
httplocalhost3000qasearchlinked_data
oclc_fastpersonal_nameq=twainamp
maximumRecords=2
httpexperimentalworldcatorgfast
searchquery=oclcpersonalName+22twain22
ampsortKeys=usageampmaximumRecords=2
lthttpidworldcatorgfast31622gt
a schemaPerson
dctermsidentifier 31622
skosprefLabel Twain Mark 1835-1910
skosaltLabel Make Teviin 1835-1910
Make Tuwen 1835-1910
lthttpidworldcatorgfast365563gt
a schemaPerson
dctermsidentifier 365563
skosprefLabel Twain Shania
skosaltLabel Twain Eilleen
Edwards Eilleen
Direct Access
of External
Authority
HyraxVitrolib ndash UI for selecting an entry from an authority
Multi-Server Architecture
QA ndash normalize RDF returned from an authority
httplocalhost3000qasearchlinked_data
oclc_fastpersonal_nameq=twainamp
maximumRecords=2
[urihttpidworldcatorgfast31622
id31622 labelTwain Mark 1835-1910
urihttpidworldcatorgfast365563
id365563labelTwain Shania ]
httpexperimentalworldcatorgfast
searchquery=oclcpersonalName+22twain22
ampsortKeys=usageampmaximumRecords=2
lthttpidworldcatorgfast31622gt
a schemaPerson
dctermsidentifier 31622
skosprefLabel Twain Mark 1835-1910
skosaltLabel Make Teviin 1835-1910
Make Tuwen 1835-1910
lthttpidworldcatorgfast365563gt
a schemaPerson
dctermsidentifier 365563
skosprefLabel Twain Shania
skosaltLabel Twain Eilleen
Edwards Eilleen
Direct Access
of External
Authority
HyraxVitrolib ndash UI for selecting an entry from an authority
Direct Access Query API
Direct against authorityhellip
httpexperimentalworldcatorgfastsearch
query=oclcpersonalName+22twain22
ampmaximumRecords=2
httpapigeonamesorgsearch
q=ithaca
ampmaxRows=2
ampusername=demo
amptype=rdf
httpartemideartuniroma2it8081agrovocrestv1search
query=milk
amplang=en
ampmaxhits=2
Normalized Query API
Through QA normalization layerhellip
httplocalhost3000qasearchlinked_dataoclc_fast
q=twain
ampmaxRecords=2
httplocalhost3000qasearchlinked_datageonames
q=ithaca
ampmaxRecords=2
httplocalhost3000qasearchlinked_dataagrovoc
q=milk
ampmaxRecords=2
amplang=en
Normalized Results
[urihttpidworldcatorgfast31622 id31622 labelTwain Mark 1835-1910 urihttpidworldcatorgfast365563 id365563 labelTwain Shania]
[uri httpswsgeonamesorg2162552 id httpswsgeonamesorg2162552 label Ithaca (AU) uri httpswsgeonamesorg4515289 id httpswsgeonamesorg4515289 label Ithaca (US)]
[uri httpaimsfaoorgaosagrovocc_8602 id httpaimsfaoorgaosagrovocc_8602 label acidophilus milk uri httpaimsfaoorgaosagrovocc_16076 id httpaimsfaoorgaosagrovocc_16076 label buffalo milk]
OCLC FAST GeoNames AgroVoc
Second Set of Challenges
5 Reliability amp Efficiency eg server uptime server load
6 Accuracy eg select results based on usage data lexical match
custom weighting other
7 Order Ranking eg How to order a graph
Cache Server Query Process
JSP Query API
Jena-Fuseki
Triplestore
One full setup per authority
LuceneSOLR
Index
Cache Server Query Process
JSP Query API
httpservicesld4lorgld4l_servicesloc_name_batchjspquery=ezra20cornellampmaxRecords=10
Jena-Fuseki
Triplestore
LuceneSOLR
Index
One full setup per authority
Cache Server Query Process
JSP Query API
lucene search for ezra cornell
index built with predicate values
ltskosprefLabelgt
ltskosaltLabelgt
Jena-Fuseki
Triplestore
LuceneSOLR
Index
httpservicesld4lorgld4l_servicesloc_name_batchjspquery=ezra20cornellampmaxRecords=10
One full setup per authority
Cache Server Query Process
JSP Query API
extract search rank
extract URI
Jena-Fuseki
Triplestore
for each result
LuceneSOLR
Index
lucene search for ezra cornell
index built with predicate values
ltskosprefLabelgt
ltskosaltLabelgt
httpservicesld4lorgld4l_servicesloc_name_batchjspquery=ezra20cornellampmaxRecords=10
One full setup per authority
Cache Server Query Process
JSP Query API
sparql query for URI
Jena-Fuseki
Triplestore
LuceneSOLR
Index
extract search rank
extract URI
for each result
lucene search for ezra cornell
index built with predicate values
ltskosprefLabelgt
ltskosaltLabelgt
httpservicesld4lorgld4l_servicesloc_name_batchjspquery=ezra20cornellampmaxRecords=10
One full setup per authority
Cache Server Query Process
JSP Query API
combine all results
Jena-Fuseki
Triplestore
insert search rank in predicate
lthttpvivoweborgontology
corerankgt
LuceneSOLR
Index
sparql query for URI extract search rank
extract URI
for each result
lucene search for ezra cornell
index built with predicate values
ltskosprefLabelgt
ltskosaltLabelgt
httpservicesld4lorgld4l_servicesloc_name_batchjspquery=ezra20cornellampmaxRecords=10
One full setup per authority
UI-QA-Authority
QA ndash normalize RDF returned from an authority
httplocalhost3000qasearchlinked_data
oclc_fastpersonal_nameq=twainampmaximumRecords=2
[urihttpidworldcatorgfast31622id31622
labelTwain Mark 1835-1910
urihttpidworldcatorgfast365563id365563
labelTwain Shania
httpexperimentalworldcatorgfastsearchquery=o
clcpersonalName+22twain22
ampsortKeys=usageampmaximumRecords=2
RDF of
search
results
Active-Triples
LDF Cache
(Marmotta or
Blazegraph) LDF Cache Jena-Fuseki-
Lucene
Cache
Direct Access
of External
Authority
HyraxVitrolib ndash UI for selecting an entry from an authority
search of cache performed via Lucene index
Third Set of Challenges
8 Disambiguation through better context eg expand from just prefLabel tohellip
preLabel altLabel birthdeath dates occupation etc
9 Reconciliation across multiple sources eg match LoC URI to OCLC FAST URI
53
Whatrsquos next
Addressing Architectural Challenges
bull Generalize process for accessing context on the
cache server and in the normalization layer
bull Multi-authority search and reconciliation
bull Address the need for cache refresh
bull Mirrored cache servers
User Experience and Design
bull User-centered Design
bull Observe listen learn design evaluate iterate
bull Iteratively design and evaluate UI for lookupauthorities
with catalogers
bull Search result rankingorderingfiltering for catalogers
bull Additional UI platforms eg FOLIO
56
Questions
httptinyurlcomld4l-auth-access
Appendix for Challenges 1-4
Challenge 1 Documentation
58
LoC httpidlocgovtechcenter
C Harlow notes on reconciling LoC - httpsgithubcomcmh2166lc-reconcile
OCLC FAST
httpswwwoclcorgdeveloperdevelopweb-servicesfast-apilinked-dataenhtml
GeoNames
httpwwwgeonamesorgexportgeonames-searchhtml
AGROVOC httpaimsfaoorgvest-registryvocabulariesagrovoc-multilingual-agricultural-thesaurus
swagger config httpsgithubcomNatLibFiSkosmosblobmasterswaggerjson
NALT
httpsagclassnalusdagov
DBpedia httpwikidbpediaorgOnlineAccess1220Public20Faceted20Web20Service20Inter
face
Challenge 2 Linked Data Access API
59
for Search Query for Term Fetch
LoC not supported URI
OCLC FAST httpexperimentalworldcatorgfastsearchq
uery=subauth+all+22query22ampsortK
eys=usageampmaximumRecords=maximumR
ecords
URI
GeoNames httpapigeonamesorgsearchq=queryamp
maxRows=maxRowsampusername=userna
meamptype=rdf
URI
AGROVOC httpartemideartuniroma2it8081agrovocr
estv1searchquery=queryamplang=lang
httpartemideartuniroma2it8081agrovo
crestv1datauri=httpaimsfaoorgaosa
grovocterm_id
NALT httpskosmoslibrarycornelledurestv1nalt
searchquery=queryamplang=lang
httpskosmoslibrarycornelledurestv1na
ltdatauri=term_uri
DBpedia
Challenge 3 Varying Results Formats
60
for Search Query for Term Fetch
LoC not supported rdf-xml
OCLC FAST rdf-xml rdf-xml
GeoNames rdf-xml rdf-xml
AGROVOC json-ld rdf-xml json-ld turtle
NALT json-ld rdf-xml json-ld turtle
DBpedia
Challenge 4 Varying Ontologies
61
Primary Ontology Flat vs Navigation
required
LoC madsrdf
SKOS
navigation required
OCLC FAST schemaorg
SKOS
flat
GeoNames geonames flat
hierarchical
AGROVOC SKOS flat
hierarchical
NALT SKOS flat
hierarchical
DBpedia dbpedia flat
Configurations for Questioning Authority
62
LoC httpsgithubcomld4l-
labslinked_data_authoritiestreemasterqa_locconfigauthoritieslinked_dat
a
OCLC FAST httpsgithubcomld4l-
labslinked_data_authoritiestreemasterqa_oclcfastconfigauthoritieslinked
_data
GeoNames httpsgithubcomld4l-
labslinked_data_authoritiestreemasterqa_geonamesconfigauthoritieslink
ed_data
AGROVOC httpsgithubcomld4l-
labslinked_data_authoritiestreemasterqa_agrovocconfigauthoritieslinked
_data
NALT httpsgithubcomld4l-
labslinked_data_authoritiestreemasterqa_naltconfigauthoritieslinked_dat
a
DBpedia httpsgithubcomld4l-
labslinked_data_authoritiestreemasterqa_dbpediaconfigauthoritieslinked
_data
Appendix for Challenges 5-7
Creating a Cache Server
Hardware
bull 8-core 64gb 3Ghz Mac Pro (late 2013) macOS Sierra
(10126)
bull 32tb Pegasus-2 Thunderbolt RAID configured as RAID-5
Triplestore
bull Apache Jena Fuseki 240 provides SPARQL endpoint
bull Apache Tomcat 90 runs custom web application(s)
bull Apache Lucene 36 provides search interface
64
Customizations
bull custom per-data-source JSP web application provides
searchbrowsedownload functionality
bull custom (generic) SPARQL Tag Library provides API for web
apps (available at httpsgithubcomeichmannlod-utilities)
bull custom (generic) Lucene Tag Library provides API for web apps
65
Loading a New Vocabulary
bull download RDF
bull if necessary convert to n-triples (required for GeoNames data for instance)
bull use tdbloader2 to populated triplestore
bull configure Fuseki server(s) with triplestore details
bull create new JSP project in Eclipse
bull write one or more indexer programs that populate Lucene indices and run indexer(s)
bull write searchbrowsedownload application logic using the SPARQL and Lucene tags
bull package project as war
bull deploy to Apache Tomcat server(s)
bull add new service to Apache HTTPD virtual host specification
66
UI Access to Cache Server
httpservicesld4lorgld4l_servicesloc_namejsp
Downloads
68
LoC httpidlocgovdownload (n-triples OR rdf-xml)
OCLC FAST httpwwwoclcorgresearchthemesdata-sciencefastdownloadhtml (n-triples)
GeoNames httpwwwgeonamesorgontologydocumentationhtml (custom format ndash see notes for processing)
AGROVOC httpsaims-faoatlassiannetwikispacesAGVpages2949126Releases (n-triples OR rdf-xml)
NALT httpsagclassnalusdagovdownloadshtml (rdf-xml)
DBpedia httpwikidbpediaorgdownloads-2016-04
Potential Options for Reconciliation
bull VIAF for name reconciliation ndash we are doing some
work with this
bull Wikidata ndash Ive heard that they are working on
Reconciliation issues but havent yet explored in
depth bull Intro Video (3hrs)
bull API Access
bull SPARQL ndash user manual
bull federated queries with other authorities
Doing a google search for linked data reconciliation
returns a large number of articles and presentations
on this concept
Links to Code amp More
bull QA Server - Code for a small app that provides the
Questioning Authority normalization layer
bull Linked Data Authorities - Configurations that can
be used with QA Server
bull LD4L Services - UI access to Cache Server
bull VitroLib - Code for the VitroLib cataloging tool
Background
bull Traditional practices Authority File
bull Eg Name Authority Files Subject Headings Genre Forms from LOC
bull String as unique identifier eg ldquoMark Twainrdquo
bull Tasks and workflows
bull Identification ldquoAboutnessrdquo
bull Disambiguation
bull Context and original authority record
Background
bull Goals Design and architecture around accessing authorities
bull VitroLib
bull Prototype cataloging editor
bull Createsuses linked data
bull Enables lookup and use of authorities
bull Hyrax
bull Samvera technology stack
bull Incorporate authorities into institutional repository records
8
VitroLib Demo
9
What just happened
Questioning Authority
MAGIC (To Be Explained)
VitroLib Search Service
LOC Genre Forms
Search LOC Genre Form
data
Query = animation
Translate to QA Service
Request
urihttpidlocgovgf2011026141
label ldquoClay animation television programsrdquo
context
ldquoAlternate Labelrdquo [
ldquoClaymation television programsrdquo
ldquoSculptmation television programsrdquo
] hellip
urihttpidlocgovgf2011026141
label ldquoClay animation television programsrdquo
altLabelList [
ldquoClaymation television programsrdquo
ldquoSculptmation television programsrdquo
] hellip
23
Hyrax Demo
Autocomplete Saving String and URI
Authority OCLC FAST Subauthority PersonName
Selected String and URI
Saves both string and URI
Selecting a Term using
Lookup with Context
26
Selecting a Term using
Lookup with Context
27
Getting more from the same authority
Getting more from other authorities
30
Architecture
Technical Motivation
bull Linked data provideshellip
bull URIs that identify specific terms (as opposed to ambiguity of using
strings)
bull Reconciliation to relate terms that are defined in separate authorities
bull Goals of implementationhellip
bull Provide a single process to access many authorities
bull Provide efficient and reliable access to authorities
bull Provide a means for disambiguation that empowers library staff to
make the most accurate selections
First Set of Challenges
1 Finding Documentation
2 Linked Data Access API eg no support partial support requires login credentials
sparql query endpoint only
3 Varying Results Formats eg rdf-xml json-ld turtle n-triples etc
4 Varying Ontologies eg SKOS schemaorg madsrdf dbpedia geonames
Multi-Server Architecture
QA ndash normalize RDF returned from an authority
Multi-Server Architecture
QA ndash normalize RDF returned from an authority
Direct Access
of External
Authority
HyraxVitrolib ndash UI for selecting an entry from an authority
Multi-Server Architecture
QA ndash normalize RDF returned from an authority
httplocalhost3000qasearchlinked_data
oclc_fastpersonal_nameq=twainamp
maximumRecords=2
Direct Access
of External
Authority
HyraxVitrolib ndash UI for selecting an entry from an authority
Multi-Server Architecture
QA ndash normalize RDF returned from an authority
httplocalhost3000qasearchlinked_data
oclc_fastpersonal_nameq=twainamp
maximumRecords=2
httpexperimentalworldcatorgfast
searchquery=oclcpersonalName+22twain22
ampsortKeys=usageampmaximumRecords=2
Direct Access
of External
Authority
HyraxVitrolib ndash UI for selecting an entry from an authority
Multi-Server Architecture
QA ndash normalize RDF returned from an authority
httplocalhost3000qasearchlinked_data
oclc_fastpersonal_nameq=twainamp
maximumRecords=2
httpexperimentalworldcatorgfast
searchquery=oclcpersonalName+22twain22
ampsortKeys=usageampmaximumRecords=2
lthttpidworldcatorgfast31622gt
a schemaPerson
dctermsidentifier 31622
skosprefLabel Twain Mark 1835-1910
skosaltLabel Make Teviin 1835-1910
Make Tuwen 1835-1910
lthttpidworldcatorgfast365563gt
a schemaPerson
dctermsidentifier 365563
skosprefLabel Twain Shania
skosaltLabel Twain Eilleen
Edwards Eilleen
Direct Access
of External
Authority
HyraxVitrolib ndash UI for selecting an entry from an authority
Multi-Server Architecture
QA ndash normalize RDF returned from an authority
httplocalhost3000qasearchlinked_data
oclc_fastpersonal_nameq=twainamp
maximumRecords=2
[urihttpidworldcatorgfast31622
id31622 labelTwain Mark 1835-1910
urihttpidworldcatorgfast365563
id365563labelTwain Shania ]
httpexperimentalworldcatorgfast
searchquery=oclcpersonalName+22twain22
ampsortKeys=usageampmaximumRecords=2
lthttpidworldcatorgfast31622gt
a schemaPerson
dctermsidentifier 31622
skosprefLabel Twain Mark 1835-1910
skosaltLabel Make Teviin 1835-1910
Make Tuwen 1835-1910
lthttpidworldcatorgfast365563gt
a schemaPerson
dctermsidentifier 365563
skosprefLabel Twain Shania
skosaltLabel Twain Eilleen
Edwards Eilleen
Direct Access
of External
Authority
HyraxVitrolib ndash UI for selecting an entry from an authority
Direct Access Query API
Direct against authorityhellip
httpexperimentalworldcatorgfastsearch
query=oclcpersonalName+22twain22
ampmaximumRecords=2
httpapigeonamesorgsearch
q=ithaca
ampmaxRows=2
ampusername=demo
amptype=rdf
httpartemideartuniroma2it8081agrovocrestv1search
query=milk
amplang=en
ampmaxhits=2
Normalized Query API
Through QA normalization layerhellip
httplocalhost3000qasearchlinked_dataoclc_fast
q=twain
ampmaxRecords=2
httplocalhost3000qasearchlinked_datageonames
q=ithaca
ampmaxRecords=2
httplocalhost3000qasearchlinked_dataagrovoc
q=milk
ampmaxRecords=2
amplang=en
Normalized Results
[urihttpidworldcatorgfast31622 id31622 labelTwain Mark 1835-1910 urihttpidworldcatorgfast365563 id365563 labelTwain Shania]
[uri httpswsgeonamesorg2162552 id httpswsgeonamesorg2162552 label Ithaca (AU) uri httpswsgeonamesorg4515289 id httpswsgeonamesorg4515289 label Ithaca (US)]
[uri httpaimsfaoorgaosagrovocc_8602 id httpaimsfaoorgaosagrovocc_8602 label acidophilus milk uri httpaimsfaoorgaosagrovocc_16076 id httpaimsfaoorgaosagrovocc_16076 label buffalo milk]
OCLC FAST GeoNames AgroVoc
Second Set of Challenges
5 Reliability amp Efficiency eg server uptime server load
6 Accuracy eg select results based on usage data lexical match
custom weighting other
7 Order Ranking eg How to order a graph
Cache Server Query Process
JSP Query API
Jena-Fuseki
Triplestore
One full setup per authority
LuceneSOLR
Index
Cache Server Query Process
JSP Query API
httpservicesld4lorgld4l_servicesloc_name_batchjspquery=ezra20cornellampmaxRecords=10
Jena-Fuseki
Triplestore
LuceneSOLR
Index
One full setup per authority
Cache Server Query Process
JSP Query API
lucene search for ezra cornell
index built with predicate values
ltskosprefLabelgt
ltskosaltLabelgt
Jena-Fuseki
Triplestore
LuceneSOLR
Index
httpservicesld4lorgld4l_servicesloc_name_batchjspquery=ezra20cornellampmaxRecords=10
One full setup per authority
Cache Server Query Process
JSP Query API
extract search rank
extract URI
Jena-Fuseki
Triplestore
for each result
LuceneSOLR
Index
lucene search for ezra cornell
index built with predicate values
ltskosprefLabelgt
ltskosaltLabelgt
httpservicesld4lorgld4l_servicesloc_name_batchjspquery=ezra20cornellampmaxRecords=10
One full setup per authority
Cache Server Query Process
JSP Query API
sparql query for URI
Jena-Fuseki
Triplestore
LuceneSOLR
Index
extract search rank
extract URI
for each result
lucene search for ezra cornell
index built with predicate values
ltskosprefLabelgt
ltskosaltLabelgt
httpservicesld4lorgld4l_servicesloc_name_batchjspquery=ezra20cornellampmaxRecords=10
One full setup per authority
Cache Server Query Process
JSP Query API
combine all results
Jena-Fuseki
Triplestore
insert search rank in predicate
lthttpvivoweborgontology
corerankgt
LuceneSOLR
Index
sparql query for URI extract search rank
extract URI
for each result
lucene search for ezra cornell
index built with predicate values
ltskosprefLabelgt
ltskosaltLabelgt
httpservicesld4lorgld4l_servicesloc_name_batchjspquery=ezra20cornellampmaxRecords=10
One full setup per authority
UI-QA-Authority
QA ndash normalize RDF returned from an authority
httplocalhost3000qasearchlinked_data
oclc_fastpersonal_nameq=twainampmaximumRecords=2
[urihttpidworldcatorgfast31622id31622
labelTwain Mark 1835-1910
urihttpidworldcatorgfast365563id365563
labelTwain Shania
httpexperimentalworldcatorgfastsearchquery=o
clcpersonalName+22twain22
ampsortKeys=usageampmaximumRecords=2
RDF of
search
results
Active-Triples
LDF Cache
(Marmotta or
Blazegraph) LDF Cache Jena-Fuseki-
Lucene
Cache
Direct Access
of External
Authority
HyraxVitrolib ndash UI for selecting an entry from an authority
search of cache performed via Lucene index
Third Set of Challenges
8 Disambiguation through better context eg expand from just prefLabel tohellip
preLabel altLabel birthdeath dates occupation etc
9 Reconciliation across multiple sources eg match LoC URI to OCLC FAST URI
53
Whatrsquos next
Addressing Architectural Challenges
bull Generalize process for accessing context on the
cache server and in the normalization layer
bull Multi-authority search and reconciliation
bull Address the need for cache refresh
bull Mirrored cache servers
User Experience and Design
bull User-centered Design
bull Observe listen learn design evaluate iterate
bull Iteratively design and evaluate UI for lookupauthorities
with catalogers
bull Search result rankingorderingfiltering for catalogers
bull Additional UI platforms eg FOLIO
56
Questions
httptinyurlcomld4l-auth-access
Appendix for Challenges 1-4
Challenge 1 Documentation
58
LoC httpidlocgovtechcenter
C Harlow notes on reconciling LoC - httpsgithubcomcmh2166lc-reconcile
OCLC FAST
httpswwwoclcorgdeveloperdevelopweb-servicesfast-apilinked-dataenhtml
GeoNames
httpwwwgeonamesorgexportgeonames-searchhtml
AGROVOC httpaimsfaoorgvest-registryvocabulariesagrovoc-multilingual-agricultural-thesaurus
swagger config httpsgithubcomNatLibFiSkosmosblobmasterswaggerjson
NALT
httpsagclassnalusdagov
DBpedia httpwikidbpediaorgOnlineAccess1220Public20Faceted20Web20Service20Inter
face
Challenge 2 Linked Data Access API
59
for Search Query for Term Fetch
LoC not supported URI
OCLC FAST httpexperimentalworldcatorgfastsearchq
uery=subauth+all+22query22ampsortK
eys=usageampmaximumRecords=maximumR
ecords
URI
GeoNames httpapigeonamesorgsearchq=queryamp
maxRows=maxRowsampusername=userna
meamptype=rdf
URI
AGROVOC httpartemideartuniroma2it8081agrovocr
estv1searchquery=queryamplang=lang
httpartemideartuniroma2it8081agrovo
crestv1datauri=httpaimsfaoorgaosa
grovocterm_id
NALT httpskosmoslibrarycornelledurestv1nalt
searchquery=queryamplang=lang
httpskosmoslibrarycornelledurestv1na
ltdatauri=term_uri
DBpedia
Challenge 3 Varying Results Formats
60
for Search Query for Term Fetch
LoC not supported rdf-xml
OCLC FAST rdf-xml rdf-xml
GeoNames rdf-xml rdf-xml
AGROVOC json-ld rdf-xml json-ld turtle
NALT json-ld rdf-xml json-ld turtle
DBpedia
Challenge 4 Varying Ontologies
61
Primary Ontology Flat vs Navigation
required
LoC madsrdf
SKOS
navigation required
OCLC FAST schemaorg
SKOS
flat
GeoNames geonames flat
hierarchical
AGROVOC SKOS flat
hierarchical
NALT SKOS flat
hierarchical
DBpedia dbpedia flat
Configurations for Questioning Authority
62
LoC httpsgithubcomld4l-
labslinked_data_authoritiestreemasterqa_locconfigauthoritieslinked_dat
a
OCLC FAST httpsgithubcomld4l-
labslinked_data_authoritiestreemasterqa_oclcfastconfigauthoritieslinked
_data
GeoNames httpsgithubcomld4l-
labslinked_data_authoritiestreemasterqa_geonamesconfigauthoritieslink
ed_data
AGROVOC httpsgithubcomld4l-
labslinked_data_authoritiestreemasterqa_agrovocconfigauthoritieslinked
_data
NALT httpsgithubcomld4l-
labslinked_data_authoritiestreemasterqa_naltconfigauthoritieslinked_dat
a
DBpedia httpsgithubcomld4l-
labslinked_data_authoritiestreemasterqa_dbpediaconfigauthoritieslinked
_data
Appendix for Challenges 5-7
Creating a Cache Server
Hardware
bull 8-core 64gb 3Ghz Mac Pro (late 2013) macOS Sierra
(10126)
bull 32tb Pegasus-2 Thunderbolt RAID configured as RAID-5
Triplestore
bull Apache Jena Fuseki 240 provides SPARQL endpoint
bull Apache Tomcat 90 runs custom web application(s)
bull Apache Lucene 36 provides search interface
64
Customizations
bull custom per-data-source JSP web application provides
searchbrowsedownload functionality
bull custom (generic) SPARQL Tag Library provides API for web
apps (available at httpsgithubcomeichmannlod-utilities)
bull custom (generic) Lucene Tag Library provides API for web apps
65
Loading a New Vocabulary
bull download RDF
bull if necessary convert to n-triples (required for GeoNames data for instance)
bull use tdbloader2 to populated triplestore
bull configure Fuseki server(s) with triplestore details
bull create new JSP project in Eclipse
bull write one or more indexer programs that populate Lucene indices and run indexer(s)
bull write searchbrowsedownload application logic using the SPARQL and Lucene tags
bull package project as war
bull deploy to Apache Tomcat server(s)
bull add new service to Apache HTTPD virtual host specification
66
UI Access to Cache Server
httpservicesld4lorgld4l_servicesloc_namejsp
Downloads
68
LoC httpidlocgovdownload (n-triples OR rdf-xml)
OCLC FAST httpwwwoclcorgresearchthemesdata-sciencefastdownloadhtml (n-triples)
GeoNames httpwwwgeonamesorgontologydocumentationhtml (custom format ndash see notes for processing)
AGROVOC httpsaims-faoatlassiannetwikispacesAGVpages2949126Releases (n-triples OR rdf-xml)
NALT httpsagclassnalusdagovdownloadshtml (rdf-xml)
DBpedia httpwikidbpediaorgdownloads-2016-04
Potential Options for Reconciliation
bull VIAF for name reconciliation ndash we are doing some
work with this
bull Wikidata ndash Ive heard that they are working on
Reconciliation issues but havent yet explored in
depth bull Intro Video (3hrs)
bull API Access
bull SPARQL ndash user manual
bull federated queries with other authorities
Doing a google search for linked data reconciliation
returns a large number of articles and presentations
on this concept
Links to Code amp More
bull QA Server - Code for a small app that provides the
Questioning Authority normalization layer
bull Linked Data Authorities - Configurations that can
be used with QA Server
bull LD4L Services - UI access to Cache Server
bull VitroLib - Code for the VitroLib cataloging tool
Background
bull Goals Design and architecture around accessing authorities
bull VitroLib
bull Prototype cataloging editor
bull Createsuses linked data
bull Enables lookup and use of authorities
bull Hyrax
bull Samvera technology stack
bull Incorporate authorities into institutional repository records
8
VitroLib Demo
9
What just happened
Questioning Authority
MAGIC (To Be Explained)
VitroLib Search Service
LOC Genre Forms
Search LOC Genre Form
data
Query = animation
Translate to QA Service
Request
urihttpidlocgovgf2011026141
label ldquoClay animation television programsrdquo
context
ldquoAlternate Labelrdquo [
ldquoClaymation television programsrdquo
ldquoSculptmation television programsrdquo
] hellip
urihttpidlocgovgf2011026141
label ldquoClay animation television programsrdquo
altLabelList [
ldquoClaymation television programsrdquo
ldquoSculptmation television programsrdquo
] hellip
23
Hyrax Demo
Autocomplete Saving String and URI
Authority OCLC FAST Subauthority PersonName
Selected String and URI
Saves both string and URI
Selecting a Term using
Lookup with Context
26
Selecting a Term using
Lookup with Context
27
Getting more from the same authority
Getting more from other authorities
30
Architecture
Technical Motivation
bull Linked data provideshellip
bull URIs that identify specific terms (as opposed to ambiguity of using
strings)
bull Reconciliation to relate terms that are defined in separate authorities
bull Goals of implementationhellip
bull Provide a single process to access many authorities
bull Provide efficient and reliable access to authorities
bull Provide a means for disambiguation that empowers library staff to
make the most accurate selections
First Set of Challenges
1 Finding Documentation
2 Linked Data Access API eg no support partial support requires login credentials
sparql query endpoint only
3 Varying Results Formats eg rdf-xml json-ld turtle n-triples etc
4 Varying Ontologies eg SKOS schemaorg madsrdf dbpedia geonames
Multi-Server Architecture
QA ndash normalize RDF returned from an authority
Multi-Server Architecture
QA ndash normalize RDF returned from an authority
Direct Access
of External
Authority
HyraxVitrolib ndash UI for selecting an entry from an authority
Multi-Server Architecture
QA ndash normalize RDF returned from an authority
httplocalhost3000qasearchlinked_data
oclc_fastpersonal_nameq=twainamp
maximumRecords=2
Direct Access
of External
Authority
HyraxVitrolib ndash UI for selecting an entry from an authority
Multi-Server Architecture
QA ndash normalize RDF returned from an authority
httplocalhost3000qasearchlinked_data
oclc_fastpersonal_nameq=twainamp
maximumRecords=2
httpexperimentalworldcatorgfast
searchquery=oclcpersonalName+22twain22
ampsortKeys=usageampmaximumRecords=2
Direct Access
of External
Authority
HyraxVitrolib ndash UI for selecting an entry from an authority
Multi-Server Architecture
QA ndash normalize RDF returned from an authority
httplocalhost3000qasearchlinked_data
oclc_fastpersonal_nameq=twainamp
maximumRecords=2
httpexperimentalworldcatorgfast
searchquery=oclcpersonalName+22twain22
ampsortKeys=usageampmaximumRecords=2
lthttpidworldcatorgfast31622gt
a schemaPerson
dctermsidentifier 31622
skosprefLabel Twain Mark 1835-1910
skosaltLabel Make Teviin 1835-1910
Make Tuwen 1835-1910
lthttpidworldcatorgfast365563gt
a schemaPerson
dctermsidentifier 365563
skosprefLabel Twain Shania
skosaltLabel Twain Eilleen
Edwards Eilleen
Direct Access
of External
Authority
HyraxVitrolib ndash UI for selecting an entry from an authority
Multi-Server Architecture
QA ndash normalize RDF returned from an authority
httplocalhost3000qasearchlinked_data
oclc_fastpersonal_nameq=twainamp
maximumRecords=2
[urihttpidworldcatorgfast31622
id31622 labelTwain Mark 1835-1910
urihttpidworldcatorgfast365563
id365563labelTwain Shania ]
httpexperimentalworldcatorgfast
searchquery=oclcpersonalName+22twain22
ampsortKeys=usageampmaximumRecords=2
lthttpidworldcatorgfast31622gt
a schemaPerson
dctermsidentifier 31622
skosprefLabel Twain Mark 1835-1910
skosaltLabel Make Teviin 1835-1910
Make Tuwen 1835-1910
lthttpidworldcatorgfast365563gt
a schemaPerson
dctermsidentifier 365563
skosprefLabel Twain Shania
skosaltLabel Twain Eilleen
Edwards Eilleen
Direct Access
of External
Authority
HyraxVitrolib ndash UI for selecting an entry from an authority
Direct Access Query API
Direct against authorityhellip
httpexperimentalworldcatorgfastsearch
query=oclcpersonalName+22twain22
ampmaximumRecords=2
httpapigeonamesorgsearch
q=ithaca
ampmaxRows=2
ampusername=demo
amptype=rdf
httpartemideartuniroma2it8081agrovocrestv1search
query=milk
amplang=en
ampmaxhits=2
Normalized Query API
Through QA normalization layerhellip
httplocalhost3000qasearchlinked_dataoclc_fast
q=twain
ampmaxRecords=2
httplocalhost3000qasearchlinked_datageonames
q=ithaca
ampmaxRecords=2
httplocalhost3000qasearchlinked_dataagrovoc
q=milk
ampmaxRecords=2
amplang=en
Normalized Results
[urihttpidworldcatorgfast31622 id31622 labelTwain Mark 1835-1910 urihttpidworldcatorgfast365563 id365563 labelTwain Shania]
[uri httpswsgeonamesorg2162552 id httpswsgeonamesorg2162552 label Ithaca (AU) uri httpswsgeonamesorg4515289 id httpswsgeonamesorg4515289 label Ithaca (US)]
[uri httpaimsfaoorgaosagrovocc_8602 id httpaimsfaoorgaosagrovocc_8602 label acidophilus milk uri httpaimsfaoorgaosagrovocc_16076 id httpaimsfaoorgaosagrovocc_16076 label buffalo milk]
OCLC FAST GeoNames AgroVoc
Second Set of Challenges
5 Reliability amp Efficiency eg server uptime server load
6 Accuracy eg select results based on usage data lexical match
custom weighting other
7 Order Ranking eg How to order a graph
Cache Server Query Process
JSP Query API
Jena-Fuseki
Triplestore
One full setup per authority
LuceneSOLR
Index
Cache Server Query Process
JSP Query API
httpservicesld4lorgld4l_servicesloc_name_batchjspquery=ezra20cornellampmaxRecords=10
Jena-Fuseki
Triplestore
LuceneSOLR
Index
One full setup per authority
Cache Server Query Process
JSP Query API
lucene search for ezra cornell
index built with predicate values
ltskosprefLabelgt
ltskosaltLabelgt
Jena-Fuseki
Triplestore
LuceneSOLR
Index
httpservicesld4lorgld4l_servicesloc_name_batchjspquery=ezra20cornellampmaxRecords=10
One full setup per authority
Cache Server Query Process
JSP Query API
extract search rank
extract URI
Jena-Fuseki
Triplestore
for each result
LuceneSOLR
Index
lucene search for ezra cornell
index built with predicate values
ltskosprefLabelgt
ltskosaltLabelgt
httpservicesld4lorgld4l_servicesloc_name_batchjspquery=ezra20cornellampmaxRecords=10
One full setup per authority
Cache Server Query Process
JSP Query API
sparql query for URI
Jena-Fuseki
Triplestore
LuceneSOLR
Index
extract search rank
extract URI
for each result
lucene search for ezra cornell
index built with predicate values
ltskosprefLabelgt
ltskosaltLabelgt
httpservicesld4lorgld4l_servicesloc_name_batchjspquery=ezra20cornellampmaxRecords=10
One full setup per authority
Cache Server Query Process
JSP Query API
combine all results
Jena-Fuseki
Triplestore
insert search rank in predicate
lthttpvivoweborgontology
corerankgt
LuceneSOLR
Index
sparql query for URI extract search rank
extract URI
for each result
lucene search for ezra cornell
index built with predicate values
ltskosprefLabelgt
ltskosaltLabelgt
httpservicesld4lorgld4l_servicesloc_name_batchjspquery=ezra20cornellampmaxRecords=10
One full setup per authority
UI-QA-Authority
QA ndash normalize RDF returned from an authority
httplocalhost3000qasearchlinked_data
oclc_fastpersonal_nameq=twainampmaximumRecords=2
[urihttpidworldcatorgfast31622id31622
labelTwain Mark 1835-1910
urihttpidworldcatorgfast365563id365563
labelTwain Shania
httpexperimentalworldcatorgfastsearchquery=o
clcpersonalName+22twain22
ampsortKeys=usageampmaximumRecords=2
RDF of
search
results
Active-Triples
LDF Cache
(Marmotta or
Blazegraph) LDF Cache Jena-Fuseki-
Lucene
Cache
Direct Access
of External
Authority
HyraxVitrolib ndash UI for selecting an entry from an authority
search of cache performed via Lucene index
Third Set of Challenges
8 Disambiguation through better context eg expand from just prefLabel tohellip
preLabel altLabel birthdeath dates occupation etc
9 Reconciliation across multiple sources eg match LoC URI to OCLC FAST URI
53
Whatrsquos next
Addressing Architectural Challenges
bull Generalize process for accessing context on the
cache server and in the normalization layer
bull Multi-authority search and reconciliation
bull Address the need for cache refresh
bull Mirrored cache servers
User Experience and Design
bull User-centered Design
bull Observe listen learn design evaluate iterate
bull Iteratively design and evaluate UI for lookupauthorities
with catalogers
bull Search result rankingorderingfiltering for catalogers
bull Additional UI platforms eg FOLIO
56
Questions
httptinyurlcomld4l-auth-access
Appendix for Challenges 1-4
Challenge 1 Documentation
58
LoC httpidlocgovtechcenter
C Harlow notes on reconciling LoC - httpsgithubcomcmh2166lc-reconcile
OCLC FAST
httpswwwoclcorgdeveloperdevelopweb-servicesfast-apilinked-dataenhtml
GeoNames
httpwwwgeonamesorgexportgeonames-searchhtml
AGROVOC httpaimsfaoorgvest-registryvocabulariesagrovoc-multilingual-agricultural-thesaurus
swagger config httpsgithubcomNatLibFiSkosmosblobmasterswaggerjson
NALT
httpsagclassnalusdagov
DBpedia httpwikidbpediaorgOnlineAccess1220Public20Faceted20Web20Service20Inter
face
Challenge 2 Linked Data Access API
59
for Search Query for Term Fetch
LoC not supported URI
OCLC FAST httpexperimentalworldcatorgfastsearchq
uery=subauth+all+22query22ampsortK
eys=usageampmaximumRecords=maximumR
ecords
URI
GeoNames httpapigeonamesorgsearchq=queryamp
maxRows=maxRowsampusername=userna
meamptype=rdf
URI
AGROVOC httpartemideartuniroma2it8081agrovocr
estv1searchquery=queryamplang=lang
httpartemideartuniroma2it8081agrovo
crestv1datauri=httpaimsfaoorgaosa
grovocterm_id
NALT httpskosmoslibrarycornelledurestv1nalt
searchquery=queryamplang=lang
httpskosmoslibrarycornelledurestv1na
ltdatauri=term_uri
DBpedia
Challenge 3 Varying Results Formats
60
for Search Query for Term Fetch
LoC not supported rdf-xml
OCLC FAST rdf-xml rdf-xml
GeoNames rdf-xml rdf-xml
AGROVOC json-ld rdf-xml json-ld turtle
NALT json-ld rdf-xml json-ld turtle
DBpedia
Challenge 4 Varying Ontologies
61
Primary Ontology Flat vs Navigation
required
LoC madsrdf
SKOS
navigation required
OCLC FAST schemaorg
SKOS
flat
GeoNames geonames flat
hierarchical
AGROVOC SKOS flat
hierarchical
NALT SKOS flat
hierarchical
DBpedia dbpedia flat
Configurations for Questioning Authority
62
LoC httpsgithubcomld4l-
labslinked_data_authoritiestreemasterqa_locconfigauthoritieslinked_dat
a
OCLC FAST httpsgithubcomld4l-
labslinked_data_authoritiestreemasterqa_oclcfastconfigauthoritieslinked
_data
GeoNames httpsgithubcomld4l-
labslinked_data_authoritiestreemasterqa_geonamesconfigauthoritieslink
ed_data
AGROVOC httpsgithubcomld4l-
labslinked_data_authoritiestreemasterqa_agrovocconfigauthoritieslinked
_data
NALT httpsgithubcomld4l-
labslinked_data_authoritiestreemasterqa_naltconfigauthoritieslinked_dat
a
DBpedia httpsgithubcomld4l-
labslinked_data_authoritiestreemasterqa_dbpediaconfigauthoritieslinked
_data
Appendix for Challenges 5-7
Creating a Cache Server
Hardware
bull 8-core 64gb 3Ghz Mac Pro (late 2013) macOS Sierra
(10126)
bull 32tb Pegasus-2 Thunderbolt RAID configured as RAID-5
Triplestore
bull Apache Jena Fuseki 240 provides SPARQL endpoint
bull Apache Tomcat 90 runs custom web application(s)
bull Apache Lucene 36 provides search interface
64
Customizations
bull custom per-data-source JSP web application provides
searchbrowsedownload functionality
bull custom (generic) SPARQL Tag Library provides API for web
apps (available at httpsgithubcomeichmannlod-utilities)
bull custom (generic) Lucene Tag Library provides API for web apps
65
Loading a New Vocabulary
bull download RDF
bull if necessary convert to n-triples (required for GeoNames data for instance)
bull use tdbloader2 to populated triplestore
bull configure Fuseki server(s) with triplestore details
bull create new JSP project in Eclipse
bull write one or more indexer programs that populate Lucene indices and run indexer(s)
bull write searchbrowsedownload application logic using the SPARQL and Lucene tags
bull package project as war
bull deploy to Apache Tomcat server(s)
bull add new service to Apache HTTPD virtual host specification
66
UI Access to Cache Server
httpservicesld4lorgld4l_servicesloc_namejsp
Downloads
68
LoC httpidlocgovdownload (n-triples OR rdf-xml)
OCLC FAST httpwwwoclcorgresearchthemesdata-sciencefastdownloadhtml (n-triples)
GeoNames httpwwwgeonamesorgontologydocumentationhtml (custom format ndash see notes for processing)
AGROVOC httpsaims-faoatlassiannetwikispacesAGVpages2949126Releases (n-triples OR rdf-xml)
NALT httpsagclassnalusdagovdownloadshtml (rdf-xml)
DBpedia httpwikidbpediaorgdownloads-2016-04
Potential Options for Reconciliation
bull VIAF for name reconciliation ndash we are doing some
work with this
bull Wikidata ndash Ive heard that they are working on
Reconciliation issues but havent yet explored in
depth bull Intro Video (3hrs)
bull API Access
bull SPARQL ndash user manual
bull federated queries with other authorities
Doing a google search for linked data reconciliation
returns a large number of articles and presentations
on this concept
Links to Code amp More
bull QA Server - Code for a small app that provides the
Questioning Authority normalization layer
bull Linked Data Authorities - Configurations that can
be used with QA Server
bull LD4L Services - UI access to Cache Server
bull VitroLib - Code for the VitroLib cataloging tool
8
VitroLib Demo
9
What just happened
Questioning Authority
MAGIC (To Be Explained)
VitroLib Search Service
LOC Genre Forms
Search LOC Genre Form
data
Query = animation
Translate to QA Service
Request
urihttpidlocgovgf2011026141
label ldquoClay animation television programsrdquo
context
ldquoAlternate Labelrdquo [
ldquoClaymation television programsrdquo
ldquoSculptmation television programsrdquo
] hellip
urihttpidlocgovgf2011026141
label ldquoClay animation television programsrdquo
altLabelList [
ldquoClaymation television programsrdquo
ldquoSculptmation television programsrdquo
] hellip
23
Hyrax Demo
Autocomplete Saving String and URI
Authority OCLC FAST Subauthority PersonName
Selected String and URI
Saves both string and URI
Selecting a Term using
Lookup with Context
26
Selecting a Term using
Lookup with Context
27
Getting more from the same authority
Getting more from other authorities
30
Architecture
Technical Motivation
bull Linked data provideshellip
bull URIs that identify specific terms (as opposed to ambiguity of using
strings)
bull Reconciliation to relate terms that are defined in separate authorities
bull Goals of implementationhellip
bull Provide a single process to access many authorities
bull Provide efficient and reliable access to authorities
bull Provide a means for disambiguation that empowers library staff to
make the most accurate selections
First Set of Challenges
1 Finding Documentation
2 Linked Data Access API eg no support partial support requires login credentials
sparql query endpoint only
3 Varying Results Formats eg rdf-xml json-ld turtle n-triples etc
4 Varying Ontologies eg SKOS schemaorg madsrdf dbpedia geonames
Multi-Server Architecture
QA ndash normalize RDF returned from an authority
Multi-Server Architecture
QA ndash normalize RDF returned from an authority
Direct Access
of External
Authority
HyraxVitrolib ndash UI for selecting an entry from an authority
Multi-Server Architecture
QA ndash normalize RDF returned from an authority
httplocalhost3000qasearchlinked_data
oclc_fastpersonal_nameq=twainamp
maximumRecords=2
Direct Access
of External
Authority
HyraxVitrolib ndash UI for selecting an entry from an authority
Multi-Server Architecture
QA ndash normalize RDF returned from an authority
httplocalhost3000qasearchlinked_data
oclc_fastpersonal_nameq=twainamp
maximumRecords=2
httpexperimentalworldcatorgfast
searchquery=oclcpersonalName+22twain22
ampsortKeys=usageampmaximumRecords=2
Direct Access
of External
Authority
HyraxVitrolib ndash UI for selecting an entry from an authority
Multi-Server Architecture
QA ndash normalize RDF returned from an authority
httplocalhost3000qasearchlinked_data
oclc_fastpersonal_nameq=twainamp
maximumRecords=2
httpexperimentalworldcatorgfast
searchquery=oclcpersonalName+22twain22
ampsortKeys=usageampmaximumRecords=2
lthttpidworldcatorgfast31622gt
a schemaPerson
dctermsidentifier 31622
skosprefLabel Twain Mark 1835-1910
skosaltLabel Make Teviin 1835-1910
Make Tuwen 1835-1910
lthttpidworldcatorgfast365563gt
a schemaPerson
dctermsidentifier 365563
skosprefLabel Twain Shania
skosaltLabel Twain Eilleen
Edwards Eilleen
Direct Access
of External
Authority
HyraxVitrolib ndash UI for selecting an entry from an authority
Multi-Server Architecture
QA ndash normalize RDF returned from an authority
httplocalhost3000qasearchlinked_data
oclc_fastpersonal_nameq=twainamp
maximumRecords=2
[urihttpidworldcatorgfast31622
id31622 labelTwain Mark 1835-1910
urihttpidworldcatorgfast365563
id365563labelTwain Shania ]
httpexperimentalworldcatorgfast
searchquery=oclcpersonalName+22twain22
ampsortKeys=usageampmaximumRecords=2
lthttpidworldcatorgfast31622gt
a schemaPerson
dctermsidentifier 31622
skosprefLabel Twain Mark 1835-1910
skosaltLabel Make Teviin 1835-1910
Make Tuwen 1835-1910
lthttpidworldcatorgfast365563gt
a schemaPerson
dctermsidentifier 365563
skosprefLabel Twain Shania
skosaltLabel Twain Eilleen
Edwards Eilleen
Direct Access
of External
Authority
HyraxVitrolib ndash UI for selecting an entry from an authority
Direct Access Query API
Direct against authorityhellip
httpexperimentalworldcatorgfastsearch
query=oclcpersonalName+22twain22
ampmaximumRecords=2
httpapigeonamesorgsearch
q=ithaca
ampmaxRows=2
ampusername=demo
amptype=rdf
httpartemideartuniroma2it8081agrovocrestv1search
query=milk
amplang=en
ampmaxhits=2
Normalized Query API
Through QA normalization layerhellip
httplocalhost3000qasearchlinked_dataoclc_fast
q=twain
ampmaxRecords=2
httplocalhost3000qasearchlinked_datageonames
q=ithaca
ampmaxRecords=2
httplocalhost3000qasearchlinked_dataagrovoc
q=milk
ampmaxRecords=2
amplang=en
Normalized Results
[urihttpidworldcatorgfast31622 id31622 labelTwain Mark 1835-1910 urihttpidworldcatorgfast365563 id365563 labelTwain Shania]
[uri httpswsgeonamesorg2162552 id httpswsgeonamesorg2162552 label Ithaca (AU) uri httpswsgeonamesorg4515289 id httpswsgeonamesorg4515289 label Ithaca (US)]
[uri httpaimsfaoorgaosagrovocc_8602 id httpaimsfaoorgaosagrovocc_8602 label acidophilus milk uri httpaimsfaoorgaosagrovocc_16076 id httpaimsfaoorgaosagrovocc_16076 label buffalo milk]
OCLC FAST GeoNames AgroVoc
Second Set of Challenges
5 Reliability amp Efficiency eg server uptime server load
6 Accuracy eg select results based on usage data lexical match
custom weighting other
7 Order Ranking eg How to order a graph
Cache Server Query Process
JSP Query API
Jena-Fuseki
Triplestore
One full setup per authority
LuceneSOLR
Index
Cache Server Query Process
JSP Query API
httpservicesld4lorgld4l_servicesloc_name_batchjspquery=ezra20cornellampmaxRecords=10
Jena-Fuseki
Triplestore
LuceneSOLR
Index
One full setup per authority
Cache Server Query Process
JSP Query API
lucene search for ezra cornell
index built with predicate values
ltskosprefLabelgt
ltskosaltLabelgt
Jena-Fuseki
Triplestore
LuceneSOLR
Index
httpservicesld4lorgld4l_servicesloc_name_batchjspquery=ezra20cornellampmaxRecords=10
One full setup per authority
Cache Server Query Process
JSP Query API
extract search rank
extract URI
Jena-Fuseki
Triplestore
for each result
LuceneSOLR
Index
lucene search for ezra cornell
index built with predicate values
ltskosprefLabelgt
ltskosaltLabelgt
httpservicesld4lorgld4l_servicesloc_name_batchjspquery=ezra20cornellampmaxRecords=10
One full setup per authority
Cache Server Query Process
JSP Query API
sparql query for URI
Jena-Fuseki
Triplestore
LuceneSOLR
Index
extract search rank
extract URI
for each result
lucene search for ezra cornell
index built with predicate values
ltskosprefLabelgt
ltskosaltLabelgt
httpservicesld4lorgld4l_servicesloc_name_batchjspquery=ezra20cornellampmaxRecords=10
One full setup per authority
Cache Server Query Process
JSP Query API
combine all results
Jena-Fuseki
Triplestore
insert search rank in predicate
lthttpvivoweborgontology
corerankgt
LuceneSOLR
Index
sparql query for URI extract search rank
extract URI
for each result
lucene search for ezra cornell
index built with predicate values
ltskosprefLabelgt
ltskosaltLabelgt
httpservicesld4lorgld4l_servicesloc_name_batchjspquery=ezra20cornellampmaxRecords=10
One full setup per authority
UI-QA-Authority
QA ndash normalize RDF returned from an authority
httplocalhost3000qasearchlinked_data
oclc_fastpersonal_nameq=twainampmaximumRecords=2
[urihttpidworldcatorgfast31622id31622
labelTwain Mark 1835-1910
urihttpidworldcatorgfast365563id365563
labelTwain Shania
httpexperimentalworldcatorgfastsearchquery=o
clcpersonalName+22twain22
ampsortKeys=usageampmaximumRecords=2
RDF of
search
results
Active-Triples
LDF Cache
(Marmotta or
Blazegraph) LDF Cache Jena-Fuseki-
Lucene
Cache
Direct Access
of External
Authority
HyraxVitrolib ndash UI for selecting an entry from an authority
search of cache performed via Lucene index
Third Set of Challenges
8 Disambiguation through better context eg expand from just prefLabel tohellip
preLabel altLabel birthdeath dates occupation etc
9 Reconciliation across multiple sources eg match LoC URI to OCLC FAST URI
53
Whatrsquos next
Addressing Architectural Challenges
bull Generalize process for accessing context on the
cache server and in the normalization layer
bull Multi-authority search and reconciliation
bull Address the need for cache refresh
bull Mirrored cache servers
User Experience and Design
bull User-centered Design
bull Observe listen learn design evaluate iterate
bull Iteratively design and evaluate UI for lookupauthorities
with catalogers
bull Search result rankingorderingfiltering for catalogers
bull Additional UI platforms eg FOLIO
56
Questions
httptinyurlcomld4l-auth-access
Appendix for Challenges 1-4
Challenge 1 Documentation
58
LoC httpidlocgovtechcenter
C Harlow notes on reconciling LoC - httpsgithubcomcmh2166lc-reconcile
OCLC FAST
httpswwwoclcorgdeveloperdevelopweb-servicesfast-apilinked-dataenhtml
GeoNames
httpwwwgeonamesorgexportgeonames-searchhtml
AGROVOC httpaimsfaoorgvest-registryvocabulariesagrovoc-multilingual-agricultural-thesaurus
swagger config httpsgithubcomNatLibFiSkosmosblobmasterswaggerjson
NALT
httpsagclassnalusdagov
DBpedia httpwikidbpediaorgOnlineAccess1220Public20Faceted20Web20Service20Inter
face
Challenge 2 Linked Data Access API
59
for Search Query for Term Fetch
LoC not supported URI
OCLC FAST httpexperimentalworldcatorgfastsearchq
uery=subauth+all+22query22ampsortK
eys=usageampmaximumRecords=maximumR
ecords
URI
GeoNames httpapigeonamesorgsearchq=queryamp
maxRows=maxRowsampusername=userna
meamptype=rdf
URI
AGROVOC httpartemideartuniroma2it8081agrovocr
estv1searchquery=queryamplang=lang
httpartemideartuniroma2it8081agrovo
crestv1datauri=httpaimsfaoorgaosa
grovocterm_id
NALT httpskosmoslibrarycornelledurestv1nalt
searchquery=queryamplang=lang
httpskosmoslibrarycornelledurestv1na
ltdatauri=term_uri
DBpedia
Challenge 3 Varying Results Formats
60
for Search Query for Term Fetch
LoC not supported rdf-xml
OCLC FAST rdf-xml rdf-xml
GeoNames rdf-xml rdf-xml
AGROVOC json-ld rdf-xml json-ld turtle
NALT json-ld rdf-xml json-ld turtle
DBpedia
Challenge 4 Varying Ontologies
61
Primary Ontology Flat vs Navigation
required
LoC madsrdf
SKOS
navigation required
OCLC FAST schemaorg
SKOS
flat
GeoNames geonames flat
hierarchical
AGROVOC SKOS flat
hierarchical
NALT SKOS flat
hierarchical
DBpedia dbpedia flat
Configurations for Questioning Authority
62
LoC httpsgithubcomld4l-
labslinked_data_authoritiestreemasterqa_locconfigauthoritieslinked_dat
a
OCLC FAST httpsgithubcomld4l-
labslinked_data_authoritiestreemasterqa_oclcfastconfigauthoritieslinked
_data
GeoNames httpsgithubcomld4l-
labslinked_data_authoritiestreemasterqa_geonamesconfigauthoritieslink
ed_data
AGROVOC httpsgithubcomld4l-
labslinked_data_authoritiestreemasterqa_agrovocconfigauthoritieslinked
_data
NALT httpsgithubcomld4l-
labslinked_data_authoritiestreemasterqa_naltconfigauthoritieslinked_dat
a
DBpedia httpsgithubcomld4l-
labslinked_data_authoritiestreemasterqa_dbpediaconfigauthoritieslinked
_data
Appendix for Challenges 5-7
Creating a Cache Server
Hardware
bull 8-core 64gb 3Ghz Mac Pro (late 2013) macOS Sierra
(10126)
bull 32tb Pegasus-2 Thunderbolt RAID configured as RAID-5
Triplestore
bull Apache Jena Fuseki 240 provides SPARQL endpoint
bull Apache Tomcat 90 runs custom web application(s)
bull Apache Lucene 36 provides search interface
64
Customizations
bull custom per-data-source JSP web application provides
searchbrowsedownload functionality
bull custom (generic) SPARQL Tag Library provides API for web
apps (available at httpsgithubcomeichmannlod-utilities)
bull custom (generic) Lucene Tag Library provides API for web apps
65
Loading a New Vocabulary
bull download RDF
bull if necessary convert to n-triples (required for GeoNames data for instance)
bull use tdbloader2 to populated triplestore
bull configure Fuseki server(s) with triplestore details
bull create new JSP project in Eclipse
bull write one or more indexer programs that populate Lucene indices and run indexer(s)
bull write searchbrowsedownload application logic using the SPARQL and Lucene tags
bull package project as war
bull deploy to Apache Tomcat server(s)
bull add new service to Apache HTTPD virtual host specification
66
UI Access to Cache Server
httpservicesld4lorgld4l_servicesloc_namejsp
Downloads
68
LoC httpidlocgovdownload (n-triples OR rdf-xml)
OCLC FAST httpwwwoclcorgresearchthemesdata-sciencefastdownloadhtml (n-triples)
GeoNames httpwwwgeonamesorgontologydocumentationhtml (custom format ndash see notes for processing)
AGROVOC httpsaims-faoatlassiannetwikispacesAGVpages2949126Releases (n-triples OR rdf-xml)
NALT httpsagclassnalusdagovdownloadshtml (rdf-xml)
DBpedia httpwikidbpediaorgdownloads-2016-04
Potential Options for Reconciliation
bull VIAF for name reconciliation ndash we are doing some
work with this
bull Wikidata ndash Ive heard that they are working on
Reconciliation issues but havent yet explored in
depth bull Intro Video (3hrs)
bull API Access
bull SPARQL ndash user manual
bull federated queries with other authorities
Doing a google search for linked data reconciliation
returns a large number of articles and presentations
on this concept
Links to Code amp More
bull QA Server - Code for a small app that provides the
Questioning Authority normalization layer
bull Linked Data Authorities - Configurations that can
be used with QA Server
bull LD4L Services - UI access to Cache Server
bull VitroLib - Code for the VitroLib cataloging tool
9
What just happened
Questioning Authority
MAGIC (To Be Explained)
VitroLib Search Service
LOC Genre Forms
Search LOC Genre Form
data
Query = animation
Translate to QA Service
Request
urihttpidlocgovgf2011026141
label ldquoClay animation television programsrdquo
context
ldquoAlternate Labelrdquo [
ldquoClaymation television programsrdquo
ldquoSculptmation television programsrdquo
] hellip
urihttpidlocgovgf2011026141
label ldquoClay animation television programsrdquo
altLabelList [
ldquoClaymation television programsrdquo
ldquoSculptmation television programsrdquo
] hellip
23
Hyrax Demo
Autocomplete Saving String and URI
Authority OCLC FAST Subauthority PersonName
Selected String and URI
Saves both string and URI
Selecting a Term using
Lookup with Context
26
Selecting a Term using
Lookup with Context
27
Getting more from the same authority
Getting more from other authorities
30
Architecture
Technical Motivation
bull Linked data provideshellip
bull URIs that identify specific terms (as opposed to ambiguity of using
strings)
bull Reconciliation to relate terms that are defined in separate authorities
bull Goals of implementationhellip
bull Provide a single process to access many authorities
bull Provide efficient and reliable access to authorities
bull Provide a means for disambiguation that empowers library staff to
make the most accurate selections
First Set of Challenges
1 Finding Documentation
2 Linked Data Access API eg no support partial support requires login credentials
sparql query endpoint only
3 Varying Results Formats eg rdf-xml json-ld turtle n-triples etc
4 Varying Ontologies eg SKOS schemaorg madsrdf dbpedia geonames
Multi-Server Architecture
QA ndash normalize RDF returned from an authority
Multi-Server Architecture
QA ndash normalize RDF returned from an authority
Direct Access
of External
Authority
HyraxVitrolib ndash UI for selecting an entry from an authority
Multi-Server Architecture
QA ndash normalize RDF returned from an authority
httplocalhost3000qasearchlinked_data
oclc_fastpersonal_nameq=twainamp
maximumRecords=2
Direct Access
of External
Authority
HyraxVitrolib ndash UI for selecting an entry from an authority
Multi-Server Architecture
QA ndash normalize RDF returned from an authority
httplocalhost3000qasearchlinked_data
oclc_fastpersonal_nameq=twainamp
maximumRecords=2
httpexperimentalworldcatorgfast
searchquery=oclcpersonalName+22twain22
ampsortKeys=usageampmaximumRecords=2
Direct Access
of External
Authority
HyraxVitrolib ndash UI for selecting an entry from an authority
Multi-Server Architecture
QA ndash normalize RDF returned from an authority
httplocalhost3000qasearchlinked_data
oclc_fastpersonal_nameq=twainamp
maximumRecords=2
httpexperimentalworldcatorgfast
searchquery=oclcpersonalName+22twain22
ampsortKeys=usageampmaximumRecords=2
lthttpidworldcatorgfast31622gt
a schemaPerson
dctermsidentifier 31622
skosprefLabel Twain Mark 1835-1910
skosaltLabel Make Teviin 1835-1910
Make Tuwen 1835-1910
lthttpidworldcatorgfast365563gt
a schemaPerson
dctermsidentifier 365563
skosprefLabel Twain Shania
skosaltLabel Twain Eilleen
Edwards Eilleen
Direct Access
of External
Authority
HyraxVitrolib ndash UI for selecting an entry from an authority
Multi-Server Architecture
QA ndash normalize RDF returned from an authority
httplocalhost3000qasearchlinked_data
oclc_fastpersonal_nameq=twainamp
maximumRecords=2
[urihttpidworldcatorgfast31622
id31622 labelTwain Mark 1835-1910
urihttpidworldcatorgfast365563
id365563labelTwain Shania ]
httpexperimentalworldcatorgfast
searchquery=oclcpersonalName+22twain22
ampsortKeys=usageampmaximumRecords=2
lthttpidworldcatorgfast31622gt
a schemaPerson
dctermsidentifier 31622
skosprefLabel Twain Mark 1835-1910
skosaltLabel Make Teviin 1835-1910
Make Tuwen 1835-1910
lthttpidworldcatorgfast365563gt
a schemaPerson
dctermsidentifier 365563
skosprefLabel Twain Shania
skosaltLabel Twain Eilleen
Edwards Eilleen
Direct Access
of External
Authority
HyraxVitrolib ndash UI for selecting an entry from an authority
Direct Access Query API
Direct against authorityhellip
httpexperimentalworldcatorgfastsearch
query=oclcpersonalName+22twain22
ampmaximumRecords=2
httpapigeonamesorgsearch
q=ithaca
ampmaxRows=2
ampusername=demo
amptype=rdf
httpartemideartuniroma2it8081agrovocrestv1search
query=milk
amplang=en
ampmaxhits=2
Normalized Query API
Through QA normalization layerhellip
httplocalhost3000qasearchlinked_dataoclc_fast
q=twain
ampmaxRecords=2
httplocalhost3000qasearchlinked_datageonames
q=ithaca
ampmaxRecords=2
httplocalhost3000qasearchlinked_dataagrovoc
q=milk
ampmaxRecords=2
amplang=en
Normalized Results
[urihttpidworldcatorgfast31622 id31622 labelTwain Mark 1835-1910 urihttpidworldcatorgfast365563 id365563 labelTwain Shania]
[uri httpswsgeonamesorg2162552 id httpswsgeonamesorg2162552 label Ithaca (AU) uri httpswsgeonamesorg4515289 id httpswsgeonamesorg4515289 label Ithaca (US)]
[uri httpaimsfaoorgaosagrovocc_8602 id httpaimsfaoorgaosagrovocc_8602 label acidophilus milk uri httpaimsfaoorgaosagrovocc_16076 id httpaimsfaoorgaosagrovocc_16076 label buffalo milk]
OCLC FAST GeoNames AgroVoc
Second Set of Challenges
5 Reliability amp Efficiency eg server uptime server load
6 Accuracy eg select results based on usage data lexical match
custom weighting other
7 Order Ranking eg How to order a graph
Cache Server Query Process
JSP Query API
Jena-Fuseki
Triplestore
One full setup per authority
LuceneSOLR
Index
Cache Server Query Process
JSP Query API
httpservicesld4lorgld4l_servicesloc_name_batchjspquery=ezra20cornellampmaxRecords=10
Jena-Fuseki
Triplestore
LuceneSOLR
Index
One full setup per authority
Cache Server Query Process
JSP Query API
lucene search for ezra cornell
index built with predicate values
ltskosprefLabelgt
ltskosaltLabelgt
Jena-Fuseki
Triplestore
LuceneSOLR
Index
httpservicesld4lorgld4l_servicesloc_name_batchjspquery=ezra20cornellampmaxRecords=10
One full setup per authority
Cache Server Query Process
JSP Query API
extract search rank
extract URI
Jena-Fuseki
Triplestore
for each result
LuceneSOLR
Index
lucene search for ezra cornell
index built with predicate values
ltskosprefLabelgt
ltskosaltLabelgt
httpservicesld4lorgld4l_servicesloc_name_batchjspquery=ezra20cornellampmaxRecords=10
One full setup per authority
Cache Server Query Process
JSP Query API
sparql query for URI
Jena-Fuseki
Triplestore
LuceneSOLR
Index
extract search rank
extract URI
for each result
lucene search for ezra cornell
index built with predicate values
ltskosprefLabelgt
ltskosaltLabelgt
httpservicesld4lorgld4l_servicesloc_name_batchjspquery=ezra20cornellampmaxRecords=10
One full setup per authority
Cache Server Query Process
JSP Query API
combine all results
Jena-Fuseki
Triplestore
insert search rank in predicate
lthttpvivoweborgontology
corerankgt
LuceneSOLR
Index
sparql query for URI extract search rank
extract URI
for each result
lucene search for ezra cornell
index built with predicate values
ltskosprefLabelgt
ltskosaltLabelgt
httpservicesld4lorgld4l_servicesloc_name_batchjspquery=ezra20cornellampmaxRecords=10
One full setup per authority
UI-QA-Authority
QA ndash normalize RDF returned from an authority
httplocalhost3000qasearchlinked_data
oclc_fastpersonal_nameq=twainampmaximumRecords=2
[urihttpidworldcatorgfast31622id31622
labelTwain Mark 1835-1910
urihttpidworldcatorgfast365563id365563
labelTwain Shania
httpexperimentalworldcatorgfastsearchquery=o
clcpersonalName+22twain22
ampsortKeys=usageampmaximumRecords=2
RDF of
search
results
Active-Triples
LDF Cache
(Marmotta or
Blazegraph) LDF Cache Jena-Fuseki-
Lucene
Cache
Direct Access
of External
Authority
HyraxVitrolib ndash UI for selecting an entry from an authority
search of cache performed via Lucene index
Third Set of Challenges
8 Disambiguation through better context eg expand from just prefLabel tohellip
preLabel altLabel birthdeath dates occupation etc
9 Reconciliation across multiple sources eg match LoC URI to OCLC FAST URI
53
Whatrsquos next
Addressing Architectural Challenges
bull Generalize process for accessing context on the
cache server and in the normalization layer
bull Multi-authority search and reconciliation
bull Address the need for cache refresh
bull Mirrored cache servers
User Experience and Design
bull User-centered Design
bull Observe listen learn design evaluate iterate
bull Iteratively design and evaluate UI for lookupauthorities
with catalogers
bull Search result rankingorderingfiltering for catalogers
bull Additional UI platforms eg FOLIO
56
Questions
httptinyurlcomld4l-auth-access
Appendix for Challenges 1-4
Challenge 1 Documentation
58
LoC httpidlocgovtechcenter
C Harlow notes on reconciling LoC - httpsgithubcomcmh2166lc-reconcile
OCLC FAST
httpswwwoclcorgdeveloperdevelopweb-servicesfast-apilinked-dataenhtml
GeoNames
httpwwwgeonamesorgexportgeonames-searchhtml
AGROVOC httpaimsfaoorgvest-registryvocabulariesagrovoc-multilingual-agricultural-thesaurus
swagger config httpsgithubcomNatLibFiSkosmosblobmasterswaggerjson
NALT
httpsagclassnalusdagov
DBpedia httpwikidbpediaorgOnlineAccess1220Public20Faceted20Web20Service20Inter
face
Challenge 2 Linked Data Access API
59
for Search Query for Term Fetch
LoC not supported URI
OCLC FAST httpexperimentalworldcatorgfastsearchq
uery=subauth+all+22query22ampsortK
eys=usageampmaximumRecords=maximumR
ecords
URI
GeoNames httpapigeonamesorgsearchq=queryamp
maxRows=maxRowsampusername=userna
meamptype=rdf
URI
AGROVOC httpartemideartuniroma2it8081agrovocr
estv1searchquery=queryamplang=lang
httpartemideartuniroma2it8081agrovo
crestv1datauri=httpaimsfaoorgaosa
grovocterm_id
NALT httpskosmoslibrarycornelledurestv1nalt
searchquery=queryamplang=lang
httpskosmoslibrarycornelledurestv1na
ltdatauri=term_uri
DBpedia
Challenge 3 Varying Results Formats
60
for Search Query for Term Fetch
LoC not supported rdf-xml
OCLC FAST rdf-xml rdf-xml
GeoNames rdf-xml rdf-xml
AGROVOC json-ld rdf-xml json-ld turtle
NALT json-ld rdf-xml json-ld turtle
DBpedia
Challenge 4 Varying Ontologies
61
Primary Ontology Flat vs Navigation
required
LoC madsrdf
SKOS
navigation required
OCLC FAST schemaorg
SKOS
flat
GeoNames geonames flat
hierarchical
AGROVOC SKOS flat
hierarchical
NALT SKOS flat
hierarchical
DBpedia dbpedia flat
Configurations for Questioning Authority
62
LoC httpsgithubcomld4l-
labslinked_data_authoritiestreemasterqa_locconfigauthoritieslinked_dat
a
OCLC FAST httpsgithubcomld4l-
labslinked_data_authoritiestreemasterqa_oclcfastconfigauthoritieslinked
_data
GeoNames httpsgithubcomld4l-
labslinked_data_authoritiestreemasterqa_geonamesconfigauthoritieslink
ed_data
AGROVOC httpsgithubcomld4l-
labslinked_data_authoritiestreemasterqa_agrovocconfigauthoritieslinked
_data
NALT httpsgithubcomld4l-
labslinked_data_authoritiestreemasterqa_naltconfigauthoritieslinked_dat
a
DBpedia httpsgithubcomld4l-
labslinked_data_authoritiestreemasterqa_dbpediaconfigauthoritieslinked
_data
Appendix for Challenges 5-7
Creating a Cache Server
Hardware
bull 8-core 64gb 3Ghz Mac Pro (late 2013) macOS Sierra
(10126)
bull 32tb Pegasus-2 Thunderbolt RAID configured as RAID-5
Triplestore
bull Apache Jena Fuseki 240 provides SPARQL endpoint
bull Apache Tomcat 90 runs custom web application(s)
bull Apache Lucene 36 provides search interface
64
Customizations
bull custom per-data-source JSP web application provides
searchbrowsedownload functionality
bull custom (generic) SPARQL Tag Library provides API for web
apps (available at httpsgithubcomeichmannlod-utilities)
bull custom (generic) Lucene Tag Library provides API for web apps
65
Loading a New Vocabulary
bull download RDF
bull if necessary convert to n-triples (required for GeoNames data for instance)
bull use tdbloader2 to populated triplestore
bull configure Fuseki server(s) with triplestore details
bull create new JSP project in Eclipse
bull write one or more indexer programs that populate Lucene indices and run indexer(s)
bull write searchbrowsedownload application logic using the SPARQL and Lucene tags
bull package project as war
bull deploy to Apache Tomcat server(s)
bull add new service to Apache HTTPD virtual host specification
66
UI Access to Cache Server
httpservicesld4lorgld4l_servicesloc_namejsp
Downloads
68
LoC httpidlocgovdownload (n-triples OR rdf-xml)
OCLC FAST httpwwwoclcorgresearchthemesdata-sciencefastdownloadhtml (n-triples)
GeoNames httpwwwgeonamesorgontologydocumentationhtml (custom format ndash see notes for processing)
AGROVOC httpsaims-faoatlassiannetwikispacesAGVpages2949126Releases (n-triples OR rdf-xml)
NALT httpsagclassnalusdagovdownloadshtml (rdf-xml)
DBpedia httpwikidbpediaorgdownloads-2016-04
Potential Options for Reconciliation
bull VIAF for name reconciliation ndash we are doing some
work with this
bull Wikidata ndash Ive heard that they are working on
Reconciliation issues but havent yet explored in
depth bull Intro Video (3hrs)
bull API Access
bull SPARQL ndash user manual
bull federated queries with other authorities
Doing a google search for linked data reconciliation
returns a large number of articles and presentations
on this concept
Links to Code amp More
bull QA Server - Code for a small app that provides the
Questioning Authority normalization layer
bull Linked Data Authorities - Configurations that can
be used with QA Server
bull LD4L Services - UI access to Cache Server
bull VitroLib - Code for the VitroLib cataloging tool
What just happened
Questioning Authority
MAGIC (To Be Explained)
VitroLib Search Service
LOC Genre Forms
Search LOC Genre Form
data
Query = animation
Translate to QA Service
Request
urihttpidlocgovgf2011026141
label ldquoClay animation television programsrdquo
context
ldquoAlternate Labelrdquo [
ldquoClaymation television programsrdquo
ldquoSculptmation television programsrdquo
] hellip
urihttpidlocgovgf2011026141
label ldquoClay animation television programsrdquo
altLabelList [
ldquoClaymation television programsrdquo
ldquoSculptmation television programsrdquo
] hellip
23
Hyrax Demo
Autocomplete Saving String and URI
Authority OCLC FAST Subauthority PersonName
Selected String and URI
Saves both string and URI
Selecting a Term using
Lookup with Context
26
Selecting a Term using
Lookup with Context
27
Getting more from the same authority
Getting more from other authorities
30
Architecture
Technical Motivation
bull Linked data provideshellip
bull URIs that identify specific terms (as opposed to ambiguity of using
strings)
bull Reconciliation to relate terms that are defined in separate authorities
bull Goals of implementationhellip
bull Provide a single process to access many authorities
bull Provide efficient and reliable access to authorities
bull Provide a means for disambiguation that empowers library staff to
make the most accurate selections
First Set of Challenges
1 Finding Documentation
2 Linked Data Access API eg no support partial support requires login credentials
sparql query endpoint only
3 Varying Results Formats eg rdf-xml json-ld turtle n-triples etc
4 Varying Ontologies eg SKOS schemaorg madsrdf dbpedia geonames
Multi-Server Architecture
QA ndash normalize RDF returned from an authority
Multi-Server Architecture
QA ndash normalize RDF returned from an authority
Direct Access
of External
Authority
HyraxVitrolib ndash UI for selecting an entry from an authority
Multi-Server Architecture
QA ndash normalize RDF returned from an authority
httplocalhost3000qasearchlinked_data
oclc_fastpersonal_nameq=twainamp
maximumRecords=2
Direct Access
of External
Authority
HyraxVitrolib ndash UI for selecting an entry from an authority
Multi-Server Architecture
QA ndash normalize RDF returned from an authority
httplocalhost3000qasearchlinked_data
oclc_fastpersonal_nameq=twainamp
maximumRecords=2
httpexperimentalworldcatorgfast
searchquery=oclcpersonalName+22twain22
ampsortKeys=usageampmaximumRecords=2
Direct Access
of External
Authority
HyraxVitrolib ndash UI for selecting an entry from an authority
Multi-Server Architecture
QA ndash normalize RDF returned from an authority
httplocalhost3000qasearchlinked_data
oclc_fastpersonal_nameq=twainamp
maximumRecords=2
httpexperimentalworldcatorgfast
searchquery=oclcpersonalName+22twain22
ampsortKeys=usageampmaximumRecords=2
lthttpidworldcatorgfast31622gt
a schemaPerson
dctermsidentifier 31622
skosprefLabel Twain Mark 1835-1910
skosaltLabel Make Teviin 1835-1910
Make Tuwen 1835-1910
lthttpidworldcatorgfast365563gt
a schemaPerson
dctermsidentifier 365563
skosprefLabel Twain Shania
skosaltLabel Twain Eilleen
Edwards Eilleen
Direct Access
of External
Authority
HyraxVitrolib ndash UI for selecting an entry from an authority
Multi-Server Architecture
QA ndash normalize RDF returned from an authority
httplocalhost3000qasearchlinked_data
oclc_fastpersonal_nameq=twainamp
maximumRecords=2
[urihttpidworldcatorgfast31622
id31622 labelTwain Mark 1835-1910
urihttpidworldcatorgfast365563
id365563labelTwain Shania ]
httpexperimentalworldcatorgfast
searchquery=oclcpersonalName+22twain22
ampsortKeys=usageampmaximumRecords=2
lthttpidworldcatorgfast31622gt
a schemaPerson
dctermsidentifier 31622
skosprefLabel Twain Mark 1835-1910
skosaltLabel Make Teviin 1835-1910
Make Tuwen 1835-1910
lthttpidworldcatorgfast365563gt
a schemaPerson
dctermsidentifier 365563
skosprefLabel Twain Shania
skosaltLabel Twain Eilleen
Edwards Eilleen
Direct Access
of External
Authority
HyraxVitrolib ndash UI for selecting an entry from an authority
Direct Access Query API
Direct against authorityhellip
httpexperimentalworldcatorgfastsearch
query=oclcpersonalName+22twain22
ampmaximumRecords=2
httpapigeonamesorgsearch
q=ithaca
ampmaxRows=2
ampusername=demo
amptype=rdf
httpartemideartuniroma2it8081agrovocrestv1search
query=milk
amplang=en
ampmaxhits=2
Normalized Query API
Through QA normalization layerhellip
httplocalhost3000qasearchlinked_dataoclc_fast
q=twain
ampmaxRecords=2
httplocalhost3000qasearchlinked_datageonames
q=ithaca
ampmaxRecords=2
httplocalhost3000qasearchlinked_dataagrovoc
q=milk
ampmaxRecords=2
amplang=en
Normalized Results
[urihttpidworldcatorgfast31622 id31622 labelTwain Mark 1835-1910 urihttpidworldcatorgfast365563 id365563 labelTwain Shania]
[uri httpswsgeonamesorg2162552 id httpswsgeonamesorg2162552 label Ithaca (AU) uri httpswsgeonamesorg4515289 id httpswsgeonamesorg4515289 label Ithaca (US)]
[uri httpaimsfaoorgaosagrovocc_8602 id httpaimsfaoorgaosagrovocc_8602 label acidophilus milk uri httpaimsfaoorgaosagrovocc_16076 id httpaimsfaoorgaosagrovocc_16076 label buffalo milk]
OCLC FAST GeoNames AgroVoc
Second Set of Challenges
5 Reliability amp Efficiency eg server uptime server load
6 Accuracy eg select results based on usage data lexical match
custom weighting other
7 Order Ranking eg How to order a graph
Cache Server Query Process
JSP Query API
Jena-Fuseki
Triplestore
One full setup per authority
LuceneSOLR
Index
Cache Server Query Process
JSP Query API
httpservicesld4lorgld4l_servicesloc_name_batchjspquery=ezra20cornellampmaxRecords=10
Jena-Fuseki
Triplestore
LuceneSOLR
Index
One full setup per authority
Cache Server Query Process
JSP Query API
lucene search for ezra cornell
index built with predicate values
ltskosprefLabelgt
ltskosaltLabelgt
Jena-Fuseki
Triplestore
LuceneSOLR
Index
httpservicesld4lorgld4l_servicesloc_name_batchjspquery=ezra20cornellampmaxRecords=10
One full setup per authority
Cache Server Query Process
JSP Query API
extract search rank
extract URI
Jena-Fuseki
Triplestore
for each result
LuceneSOLR
Index
lucene search for ezra cornell
index built with predicate values
ltskosprefLabelgt
ltskosaltLabelgt
httpservicesld4lorgld4l_servicesloc_name_batchjspquery=ezra20cornellampmaxRecords=10
One full setup per authority
Cache Server Query Process
JSP Query API
sparql query for URI
Jena-Fuseki
Triplestore
LuceneSOLR
Index
extract search rank
extract URI
for each result
lucene search for ezra cornell
index built with predicate values
ltskosprefLabelgt
ltskosaltLabelgt
httpservicesld4lorgld4l_servicesloc_name_batchjspquery=ezra20cornellampmaxRecords=10
One full setup per authority
Cache Server Query Process
JSP Query API
combine all results
Jena-Fuseki
Triplestore
insert search rank in predicate
lthttpvivoweborgontology
corerankgt
LuceneSOLR
Index
sparql query for URI extract search rank
extract URI
for each result
lucene search for ezra cornell
index built with predicate values
ltskosprefLabelgt
ltskosaltLabelgt
httpservicesld4lorgld4l_servicesloc_name_batchjspquery=ezra20cornellampmaxRecords=10
One full setup per authority
UI-QA-Authority
QA ndash normalize RDF returned from an authority
httplocalhost3000qasearchlinked_data
oclc_fastpersonal_nameq=twainampmaximumRecords=2
[urihttpidworldcatorgfast31622id31622
labelTwain Mark 1835-1910
urihttpidworldcatorgfast365563id365563
labelTwain Shania
httpexperimentalworldcatorgfastsearchquery=o
clcpersonalName+22twain22
ampsortKeys=usageampmaximumRecords=2
RDF of
search
results
Active-Triples
LDF Cache
(Marmotta or
Blazegraph) LDF Cache Jena-Fuseki-
Lucene
Cache
Direct Access
of External
Authority
HyraxVitrolib ndash UI for selecting an entry from an authority
search of cache performed via Lucene index
Third Set of Challenges
8 Disambiguation through better context eg expand from just prefLabel tohellip
preLabel altLabel birthdeath dates occupation etc
9 Reconciliation across multiple sources eg match LoC URI to OCLC FAST URI
53
Whatrsquos next
Addressing Architectural Challenges
bull Generalize process for accessing context on the
cache server and in the normalization layer
bull Multi-authority search and reconciliation
bull Address the need for cache refresh
bull Mirrored cache servers
User Experience and Design
bull User-centered Design
bull Observe listen learn design evaluate iterate
bull Iteratively design and evaluate UI for lookupauthorities
with catalogers
bull Search result rankingorderingfiltering for catalogers
bull Additional UI platforms eg FOLIO
56
Questions
httptinyurlcomld4l-auth-access
Appendix for Challenges 1-4
Challenge 1 Documentation
58
LoC httpidlocgovtechcenter
C Harlow notes on reconciling LoC - httpsgithubcomcmh2166lc-reconcile
OCLC FAST
httpswwwoclcorgdeveloperdevelopweb-servicesfast-apilinked-dataenhtml
GeoNames
httpwwwgeonamesorgexportgeonames-searchhtml
AGROVOC httpaimsfaoorgvest-registryvocabulariesagrovoc-multilingual-agricultural-thesaurus
swagger config httpsgithubcomNatLibFiSkosmosblobmasterswaggerjson
NALT
httpsagclassnalusdagov
DBpedia httpwikidbpediaorgOnlineAccess1220Public20Faceted20Web20Service20Inter
face
Challenge 2 Linked Data Access API
59
for Search Query for Term Fetch
LoC not supported URI
OCLC FAST httpexperimentalworldcatorgfastsearchq
uery=subauth+all+22query22ampsortK
eys=usageampmaximumRecords=maximumR
ecords
URI
GeoNames httpapigeonamesorgsearchq=queryamp
maxRows=maxRowsampusername=userna
meamptype=rdf
URI
AGROVOC httpartemideartuniroma2it8081agrovocr
estv1searchquery=queryamplang=lang
httpartemideartuniroma2it8081agrovo
crestv1datauri=httpaimsfaoorgaosa
grovocterm_id
NALT httpskosmoslibrarycornelledurestv1nalt
searchquery=queryamplang=lang
httpskosmoslibrarycornelledurestv1na
ltdatauri=term_uri
DBpedia
Challenge 3 Varying Results Formats
60
for Search Query for Term Fetch
LoC not supported rdf-xml
OCLC FAST rdf-xml rdf-xml
GeoNames rdf-xml rdf-xml
AGROVOC json-ld rdf-xml json-ld turtle
NALT json-ld rdf-xml json-ld turtle
DBpedia
Challenge 4 Varying Ontologies
61
Primary Ontology Flat vs Navigation
required
LoC madsrdf
SKOS
navigation required
OCLC FAST schemaorg
SKOS
flat
GeoNames geonames flat
hierarchical
AGROVOC SKOS flat
hierarchical
NALT SKOS flat
hierarchical
DBpedia dbpedia flat
Configurations for Questioning Authority
62
LoC httpsgithubcomld4l-
labslinked_data_authoritiestreemasterqa_locconfigauthoritieslinked_dat
a
OCLC FAST httpsgithubcomld4l-
labslinked_data_authoritiestreemasterqa_oclcfastconfigauthoritieslinked
_data
GeoNames httpsgithubcomld4l-
labslinked_data_authoritiestreemasterqa_geonamesconfigauthoritieslink
ed_data
AGROVOC httpsgithubcomld4l-
labslinked_data_authoritiestreemasterqa_agrovocconfigauthoritieslinked
_data
NALT httpsgithubcomld4l-
labslinked_data_authoritiestreemasterqa_naltconfigauthoritieslinked_dat
a
DBpedia httpsgithubcomld4l-
labslinked_data_authoritiestreemasterqa_dbpediaconfigauthoritieslinked
_data
Appendix for Challenges 5-7
Creating a Cache Server
Hardware
bull 8-core 64gb 3Ghz Mac Pro (late 2013) macOS Sierra
(10126)
bull 32tb Pegasus-2 Thunderbolt RAID configured as RAID-5
Triplestore
bull Apache Jena Fuseki 240 provides SPARQL endpoint
bull Apache Tomcat 90 runs custom web application(s)
bull Apache Lucene 36 provides search interface
64
Customizations
bull custom per-data-source JSP web application provides
searchbrowsedownload functionality
bull custom (generic) SPARQL Tag Library provides API for web
apps (available at httpsgithubcomeichmannlod-utilities)
bull custom (generic) Lucene Tag Library provides API for web apps
65
Loading a New Vocabulary
bull download RDF
bull if necessary convert to n-triples (required for GeoNames data for instance)
bull use tdbloader2 to populated triplestore
bull configure Fuseki server(s) with triplestore details
bull create new JSP project in Eclipse
bull write one or more indexer programs that populate Lucene indices and run indexer(s)
bull write searchbrowsedownload application logic using the SPARQL and Lucene tags
bull package project as war
bull deploy to Apache Tomcat server(s)
bull add new service to Apache HTTPD virtual host specification
66
UI Access to Cache Server
httpservicesld4lorgld4l_servicesloc_namejsp
Downloads
68
LoC httpidlocgovdownload (n-triples OR rdf-xml)
OCLC FAST httpwwwoclcorgresearchthemesdata-sciencefastdownloadhtml (n-triples)
GeoNames httpwwwgeonamesorgontologydocumentationhtml (custom format ndash see notes for processing)
AGROVOC httpsaims-faoatlassiannetwikispacesAGVpages2949126Releases (n-triples OR rdf-xml)
NALT httpsagclassnalusdagovdownloadshtml (rdf-xml)
DBpedia httpwikidbpediaorgdownloads-2016-04
Potential Options for Reconciliation
bull VIAF for name reconciliation ndash we are doing some
work with this
bull Wikidata ndash Ive heard that they are working on
Reconciliation issues but havent yet explored in
depth bull Intro Video (3hrs)
bull API Access
bull SPARQL ndash user manual
bull federated queries with other authorities
Doing a google search for linked data reconciliation
returns a large number of articles and presentations
on this concept
Links to Code amp More
bull QA Server - Code for a small app that provides the
Questioning Authority normalization layer
bull Linked Data Authorities - Configurations that can
be used with QA Server
bull LD4L Services - UI access to Cache Server
bull VitroLib - Code for the VitroLib cataloging tool
23
Hyrax Demo
Autocomplete Saving String and URI
Authority OCLC FAST Subauthority PersonName
Selected String and URI
Saves both string and URI
Selecting a Term using
Lookup with Context
26
Selecting a Term using
Lookup with Context
27
Getting more from the same authority
Getting more from other authorities
30
Architecture
Technical Motivation
bull Linked data provideshellip
bull URIs that identify specific terms (as opposed to ambiguity of using
strings)
bull Reconciliation to relate terms that are defined in separate authorities
bull Goals of implementationhellip
bull Provide a single process to access many authorities
bull Provide efficient and reliable access to authorities
bull Provide a means for disambiguation that empowers library staff to
make the most accurate selections
First Set of Challenges
1 Finding Documentation
2 Linked Data Access API eg no support partial support requires login credentials
sparql query endpoint only
3 Varying Results Formats eg rdf-xml json-ld turtle n-triples etc
4 Varying Ontologies eg SKOS schemaorg madsrdf dbpedia geonames
Multi-Server Architecture
QA ndash normalize RDF returned from an authority
Multi-Server Architecture
QA ndash normalize RDF returned from an authority
Direct Access
of External
Authority
HyraxVitrolib ndash UI for selecting an entry from an authority
Multi-Server Architecture
QA ndash normalize RDF returned from an authority
httplocalhost3000qasearchlinked_data
oclc_fastpersonal_nameq=twainamp
maximumRecords=2
Direct Access
of External
Authority
HyraxVitrolib ndash UI for selecting an entry from an authority
Multi-Server Architecture
QA ndash normalize RDF returned from an authority
httplocalhost3000qasearchlinked_data
oclc_fastpersonal_nameq=twainamp
maximumRecords=2
httpexperimentalworldcatorgfast
searchquery=oclcpersonalName+22twain22
ampsortKeys=usageampmaximumRecords=2
Direct Access
of External
Authority
HyraxVitrolib ndash UI for selecting an entry from an authority
Multi-Server Architecture
QA ndash normalize RDF returned from an authority
httplocalhost3000qasearchlinked_data
oclc_fastpersonal_nameq=twainamp
maximumRecords=2
httpexperimentalworldcatorgfast
searchquery=oclcpersonalName+22twain22
ampsortKeys=usageampmaximumRecords=2
lthttpidworldcatorgfast31622gt
a schemaPerson
dctermsidentifier 31622
skosprefLabel Twain Mark 1835-1910
skosaltLabel Make Teviin 1835-1910
Make Tuwen 1835-1910
lthttpidworldcatorgfast365563gt
a schemaPerson
dctermsidentifier 365563
skosprefLabel Twain Shania
skosaltLabel Twain Eilleen
Edwards Eilleen
Direct Access
of External
Authority
HyraxVitrolib ndash UI for selecting an entry from an authority
Multi-Server Architecture
QA ndash normalize RDF returned from an authority
httplocalhost3000qasearchlinked_data
oclc_fastpersonal_nameq=twainamp
maximumRecords=2
[urihttpidworldcatorgfast31622
id31622 labelTwain Mark 1835-1910
urihttpidworldcatorgfast365563
id365563labelTwain Shania ]
httpexperimentalworldcatorgfast
searchquery=oclcpersonalName+22twain22
ampsortKeys=usageampmaximumRecords=2
lthttpidworldcatorgfast31622gt
a schemaPerson
dctermsidentifier 31622
skosprefLabel Twain Mark 1835-1910
skosaltLabel Make Teviin 1835-1910
Make Tuwen 1835-1910
lthttpidworldcatorgfast365563gt
a schemaPerson
dctermsidentifier 365563
skosprefLabel Twain Shania
skosaltLabel Twain Eilleen
Edwards Eilleen
Direct Access
of External
Authority
HyraxVitrolib ndash UI for selecting an entry from an authority
Direct Access Query API
Direct against authorityhellip
httpexperimentalworldcatorgfastsearch
query=oclcpersonalName+22twain22
ampmaximumRecords=2
httpapigeonamesorgsearch
q=ithaca
ampmaxRows=2
ampusername=demo
amptype=rdf
httpartemideartuniroma2it8081agrovocrestv1search
query=milk
amplang=en
ampmaxhits=2
Normalized Query API
Through QA normalization layerhellip
httplocalhost3000qasearchlinked_dataoclc_fast
q=twain
ampmaxRecords=2
httplocalhost3000qasearchlinked_datageonames
q=ithaca
ampmaxRecords=2
httplocalhost3000qasearchlinked_dataagrovoc
q=milk
ampmaxRecords=2
amplang=en
Normalized Results
[urihttpidworldcatorgfast31622 id31622 labelTwain Mark 1835-1910 urihttpidworldcatorgfast365563 id365563 labelTwain Shania]
[uri httpswsgeonamesorg2162552 id httpswsgeonamesorg2162552 label Ithaca (AU) uri httpswsgeonamesorg4515289 id httpswsgeonamesorg4515289 label Ithaca (US)]
[uri httpaimsfaoorgaosagrovocc_8602 id httpaimsfaoorgaosagrovocc_8602 label acidophilus milk uri httpaimsfaoorgaosagrovocc_16076 id httpaimsfaoorgaosagrovocc_16076 label buffalo milk]
OCLC FAST GeoNames AgroVoc
Second Set of Challenges
5 Reliability amp Efficiency eg server uptime server load
6 Accuracy eg select results based on usage data lexical match
custom weighting other
7 Order Ranking eg How to order a graph
Cache Server Query Process
JSP Query API
Jena-Fuseki
Triplestore
One full setup per authority
LuceneSOLR
Index
Cache Server Query Process
JSP Query API
httpservicesld4lorgld4l_servicesloc_name_batchjspquery=ezra20cornellampmaxRecords=10
Jena-Fuseki
Triplestore
LuceneSOLR
Index
One full setup per authority
Cache Server Query Process
JSP Query API
lucene search for ezra cornell
index built with predicate values
ltskosprefLabelgt
ltskosaltLabelgt
Jena-Fuseki
Triplestore
LuceneSOLR
Index
httpservicesld4lorgld4l_servicesloc_name_batchjspquery=ezra20cornellampmaxRecords=10
One full setup per authority
Cache Server Query Process
JSP Query API
extract search rank
extract URI
Jena-Fuseki
Triplestore
for each result
LuceneSOLR
Index
lucene search for ezra cornell
index built with predicate values
ltskosprefLabelgt
ltskosaltLabelgt
httpservicesld4lorgld4l_servicesloc_name_batchjspquery=ezra20cornellampmaxRecords=10
One full setup per authority
Cache Server Query Process
JSP Query API
sparql query for URI
Jena-Fuseki
Triplestore
LuceneSOLR
Index
extract search rank
extract URI
for each result
lucene search for ezra cornell
index built with predicate values
ltskosprefLabelgt
ltskosaltLabelgt
httpservicesld4lorgld4l_servicesloc_name_batchjspquery=ezra20cornellampmaxRecords=10
One full setup per authority
Cache Server Query Process
JSP Query API
combine all results
Jena-Fuseki
Triplestore
insert search rank in predicate
lthttpvivoweborgontology
corerankgt
LuceneSOLR
Index
sparql query for URI extract search rank
extract URI
for each result
lucene search for ezra cornell
index built with predicate values
ltskosprefLabelgt
ltskosaltLabelgt
httpservicesld4lorgld4l_servicesloc_name_batchjspquery=ezra20cornellampmaxRecords=10
One full setup per authority
UI-QA-Authority
QA ndash normalize RDF returned from an authority
httplocalhost3000qasearchlinked_data
oclc_fastpersonal_nameq=twainampmaximumRecords=2
[urihttpidworldcatorgfast31622id31622
labelTwain Mark 1835-1910
urihttpidworldcatorgfast365563id365563
labelTwain Shania
httpexperimentalworldcatorgfastsearchquery=o
clcpersonalName+22twain22
ampsortKeys=usageampmaximumRecords=2
RDF of
search
results
Active-Triples
LDF Cache
(Marmotta or
Blazegraph) LDF Cache Jena-Fuseki-
Lucene
Cache
Direct Access
of External
Authority
HyraxVitrolib ndash UI for selecting an entry from an authority
search of cache performed via Lucene index
Third Set of Challenges
8 Disambiguation through better context eg expand from just prefLabel tohellip
preLabel altLabel birthdeath dates occupation etc
9 Reconciliation across multiple sources eg match LoC URI to OCLC FAST URI
53
Whatrsquos next
Addressing Architectural Challenges
bull Generalize process for accessing context on the
cache server and in the normalization layer
bull Multi-authority search and reconciliation
bull Address the need for cache refresh
bull Mirrored cache servers
User Experience and Design
bull User-centered Design
bull Observe listen learn design evaluate iterate
bull Iteratively design and evaluate UI for lookupauthorities
with catalogers
bull Search result rankingorderingfiltering for catalogers
bull Additional UI platforms eg FOLIO
56
Questions
httptinyurlcomld4l-auth-access
Appendix for Challenges 1-4
Challenge 1 Documentation
58
LoC httpidlocgovtechcenter
C Harlow notes on reconciling LoC - httpsgithubcomcmh2166lc-reconcile
OCLC FAST
httpswwwoclcorgdeveloperdevelopweb-servicesfast-apilinked-dataenhtml
GeoNames
httpwwwgeonamesorgexportgeonames-searchhtml
AGROVOC httpaimsfaoorgvest-registryvocabulariesagrovoc-multilingual-agricultural-thesaurus
swagger config httpsgithubcomNatLibFiSkosmosblobmasterswaggerjson
NALT
httpsagclassnalusdagov
DBpedia httpwikidbpediaorgOnlineAccess1220Public20Faceted20Web20Service20Inter
face
Challenge 2 Linked Data Access API
59
for Search Query for Term Fetch
LoC not supported URI
OCLC FAST httpexperimentalworldcatorgfastsearchq
uery=subauth+all+22query22ampsortK
eys=usageampmaximumRecords=maximumR
ecords
URI
GeoNames httpapigeonamesorgsearchq=queryamp
maxRows=maxRowsampusername=userna
meamptype=rdf
URI
AGROVOC httpartemideartuniroma2it8081agrovocr
estv1searchquery=queryamplang=lang
httpartemideartuniroma2it8081agrovo
crestv1datauri=httpaimsfaoorgaosa
grovocterm_id
NALT httpskosmoslibrarycornelledurestv1nalt
searchquery=queryamplang=lang
httpskosmoslibrarycornelledurestv1na
ltdatauri=term_uri
DBpedia
Challenge 3 Varying Results Formats
60
for Search Query for Term Fetch
LoC not supported rdf-xml
OCLC FAST rdf-xml rdf-xml
GeoNames rdf-xml rdf-xml
AGROVOC json-ld rdf-xml json-ld turtle
NALT json-ld rdf-xml json-ld turtle
DBpedia
Challenge 4 Varying Ontologies
61
Primary Ontology Flat vs Navigation
required
LoC madsrdf
SKOS
navigation required
OCLC FAST schemaorg
SKOS
flat
GeoNames geonames flat
hierarchical
AGROVOC SKOS flat
hierarchical
NALT SKOS flat
hierarchical
DBpedia dbpedia flat
Configurations for Questioning Authority
62
LoC httpsgithubcomld4l-
labslinked_data_authoritiestreemasterqa_locconfigauthoritieslinked_dat
a
OCLC FAST httpsgithubcomld4l-
labslinked_data_authoritiestreemasterqa_oclcfastconfigauthoritieslinked
_data
GeoNames httpsgithubcomld4l-
labslinked_data_authoritiestreemasterqa_geonamesconfigauthoritieslink
ed_data
AGROVOC httpsgithubcomld4l-
labslinked_data_authoritiestreemasterqa_agrovocconfigauthoritieslinked
_data
NALT httpsgithubcomld4l-
labslinked_data_authoritiestreemasterqa_naltconfigauthoritieslinked_dat
a
DBpedia httpsgithubcomld4l-
labslinked_data_authoritiestreemasterqa_dbpediaconfigauthoritieslinked
_data
Appendix for Challenges 5-7
Creating a Cache Server
Hardware
bull 8-core 64gb 3Ghz Mac Pro (late 2013) macOS Sierra
(10126)
bull 32tb Pegasus-2 Thunderbolt RAID configured as RAID-5
Triplestore
bull Apache Jena Fuseki 240 provides SPARQL endpoint
bull Apache Tomcat 90 runs custom web application(s)
bull Apache Lucene 36 provides search interface
64
Customizations
bull custom per-data-source JSP web application provides
searchbrowsedownload functionality
bull custom (generic) SPARQL Tag Library provides API for web
apps (available at httpsgithubcomeichmannlod-utilities)
bull custom (generic) Lucene Tag Library provides API for web apps
65
Loading a New Vocabulary
bull download RDF
bull if necessary convert to n-triples (required for GeoNames data for instance)
bull use tdbloader2 to populated triplestore
bull configure Fuseki server(s) with triplestore details
bull create new JSP project in Eclipse
bull write one or more indexer programs that populate Lucene indices and run indexer(s)
bull write searchbrowsedownload application logic using the SPARQL and Lucene tags
bull package project as war
bull deploy to Apache Tomcat server(s)
bull add new service to Apache HTTPD virtual host specification
66
UI Access to Cache Server
httpservicesld4lorgld4l_servicesloc_namejsp
Downloads
68
LoC httpidlocgovdownload (n-triples OR rdf-xml)
OCLC FAST httpwwwoclcorgresearchthemesdata-sciencefastdownloadhtml (n-triples)
GeoNames httpwwwgeonamesorgontologydocumentationhtml (custom format ndash see notes for processing)
AGROVOC httpsaims-faoatlassiannetwikispacesAGVpages2949126Releases (n-triples OR rdf-xml)
NALT httpsagclassnalusdagovdownloadshtml (rdf-xml)
DBpedia httpwikidbpediaorgdownloads-2016-04
Potential Options for Reconciliation
bull VIAF for name reconciliation ndash we are doing some
work with this
bull Wikidata ndash Ive heard that they are working on
Reconciliation issues but havent yet explored in
depth bull Intro Video (3hrs)
bull API Access
bull SPARQL ndash user manual
bull federated queries with other authorities
Doing a google search for linked data reconciliation
returns a large number of articles and presentations
on this concept
Links to Code amp More
bull QA Server - Code for a small app that provides the
Questioning Authority normalization layer
bull Linked Data Authorities - Configurations that can
be used with QA Server
bull LD4L Services - UI access to Cache Server
bull VitroLib - Code for the VitroLib cataloging tool
Autocomplete Saving String and URI
Authority OCLC FAST Subauthority PersonName
Selected String and URI
Saves both string and URI
Selecting a Term using
Lookup with Context
26
Selecting a Term using
Lookup with Context
27
Getting more from the same authority
Getting more from other authorities
30
Architecture
Technical Motivation
bull Linked data provideshellip
bull URIs that identify specific terms (as opposed to ambiguity of using
strings)
bull Reconciliation to relate terms that are defined in separate authorities
bull Goals of implementationhellip
bull Provide a single process to access many authorities
bull Provide efficient and reliable access to authorities
bull Provide a means for disambiguation that empowers library staff to
make the most accurate selections
First Set of Challenges
1 Finding Documentation
2 Linked Data Access API eg no support partial support requires login credentials
sparql query endpoint only
3 Varying Results Formats eg rdf-xml json-ld turtle n-triples etc
4 Varying Ontologies eg SKOS schemaorg madsrdf dbpedia geonames
Multi-Server Architecture
QA ndash normalize RDF returned from an authority
Multi-Server Architecture
QA ndash normalize RDF returned from an authority
Direct Access
of External
Authority
HyraxVitrolib ndash UI for selecting an entry from an authority
Multi-Server Architecture
QA ndash normalize RDF returned from an authority
httplocalhost3000qasearchlinked_data
oclc_fastpersonal_nameq=twainamp
maximumRecords=2
Direct Access
of External
Authority
HyraxVitrolib ndash UI for selecting an entry from an authority
Multi-Server Architecture
QA ndash normalize RDF returned from an authority
httplocalhost3000qasearchlinked_data
oclc_fastpersonal_nameq=twainamp
maximumRecords=2
httpexperimentalworldcatorgfast
searchquery=oclcpersonalName+22twain22
ampsortKeys=usageampmaximumRecords=2
Direct Access
of External
Authority
HyraxVitrolib ndash UI for selecting an entry from an authority
Multi-Server Architecture
QA ndash normalize RDF returned from an authority
httplocalhost3000qasearchlinked_data
oclc_fastpersonal_nameq=twainamp
maximumRecords=2
httpexperimentalworldcatorgfast
searchquery=oclcpersonalName+22twain22
ampsortKeys=usageampmaximumRecords=2
lthttpidworldcatorgfast31622gt
a schemaPerson
dctermsidentifier 31622
skosprefLabel Twain Mark 1835-1910
skosaltLabel Make Teviin 1835-1910
Make Tuwen 1835-1910
lthttpidworldcatorgfast365563gt
a schemaPerson
dctermsidentifier 365563
skosprefLabel Twain Shania
skosaltLabel Twain Eilleen
Edwards Eilleen
Direct Access
of External
Authority
HyraxVitrolib ndash UI for selecting an entry from an authority
Multi-Server Architecture
QA ndash normalize RDF returned from an authority
httplocalhost3000qasearchlinked_data
oclc_fastpersonal_nameq=twainamp
maximumRecords=2
[urihttpidworldcatorgfast31622
id31622 labelTwain Mark 1835-1910
urihttpidworldcatorgfast365563
id365563labelTwain Shania ]
httpexperimentalworldcatorgfast
searchquery=oclcpersonalName+22twain22
ampsortKeys=usageampmaximumRecords=2
lthttpidworldcatorgfast31622gt
a schemaPerson
dctermsidentifier 31622
skosprefLabel Twain Mark 1835-1910
skosaltLabel Make Teviin 1835-1910
Make Tuwen 1835-1910
lthttpidworldcatorgfast365563gt
a schemaPerson
dctermsidentifier 365563
skosprefLabel Twain Shania
skosaltLabel Twain Eilleen
Edwards Eilleen
Direct Access
of External
Authority
HyraxVitrolib ndash UI for selecting an entry from an authority
Direct Access Query API
Direct against authorityhellip
httpexperimentalworldcatorgfastsearch
query=oclcpersonalName+22twain22
ampmaximumRecords=2
httpapigeonamesorgsearch
q=ithaca
ampmaxRows=2
ampusername=demo
amptype=rdf
httpartemideartuniroma2it8081agrovocrestv1search
query=milk
amplang=en
ampmaxhits=2
Normalized Query API
Through QA normalization layerhellip
httplocalhost3000qasearchlinked_dataoclc_fast
q=twain
ampmaxRecords=2
httplocalhost3000qasearchlinked_datageonames
q=ithaca
ampmaxRecords=2
httplocalhost3000qasearchlinked_dataagrovoc
q=milk
ampmaxRecords=2
amplang=en
Normalized Results
[urihttpidworldcatorgfast31622 id31622 labelTwain Mark 1835-1910 urihttpidworldcatorgfast365563 id365563 labelTwain Shania]
[uri httpswsgeonamesorg2162552 id httpswsgeonamesorg2162552 label Ithaca (AU) uri httpswsgeonamesorg4515289 id httpswsgeonamesorg4515289 label Ithaca (US)]
[uri httpaimsfaoorgaosagrovocc_8602 id httpaimsfaoorgaosagrovocc_8602 label acidophilus milk uri httpaimsfaoorgaosagrovocc_16076 id httpaimsfaoorgaosagrovocc_16076 label buffalo milk]
OCLC FAST GeoNames AgroVoc
Second Set of Challenges
5 Reliability amp Efficiency eg server uptime server load
6 Accuracy eg select results based on usage data lexical match
custom weighting other
7 Order Ranking eg How to order a graph
Cache Server Query Process
JSP Query API
Jena-Fuseki
Triplestore
One full setup per authority
LuceneSOLR
Index
Cache Server Query Process
JSP Query API
httpservicesld4lorgld4l_servicesloc_name_batchjspquery=ezra20cornellampmaxRecords=10
Jena-Fuseki
Triplestore
LuceneSOLR
Index
One full setup per authority
Cache Server Query Process
JSP Query API
lucene search for ezra cornell
index built with predicate values
ltskosprefLabelgt
ltskosaltLabelgt
Jena-Fuseki
Triplestore
LuceneSOLR
Index
httpservicesld4lorgld4l_servicesloc_name_batchjspquery=ezra20cornellampmaxRecords=10
One full setup per authority
Cache Server Query Process
JSP Query API
extract search rank
extract URI
Jena-Fuseki
Triplestore
for each result
LuceneSOLR
Index
lucene search for ezra cornell
index built with predicate values
ltskosprefLabelgt
ltskosaltLabelgt
httpservicesld4lorgld4l_servicesloc_name_batchjspquery=ezra20cornellampmaxRecords=10
One full setup per authority
Cache Server Query Process
JSP Query API
sparql query for URI
Jena-Fuseki
Triplestore
LuceneSOLR
Index
extract search rank
extract URI
for each result
lucene search for ezra cornell
index built with predicate values
ltskosprefLabelgt
ltskosaltLabelgt
httpservicesld4lorgld4l_servicesloc_name_batchjspquery=ezra20cornellampmaxRecords=10
One full setup per authority
Cache Server Query Process
JSP Query API
combine all results
Jena-Fuseki
Triplestore
insert search rank in predicate
lthttpvivoweborgontology
corerankgt
LuceneSOLR
Index
sparql query for URI extract search rank
extract URI
for each result
lucene search for ezra cornell
index built with predicate values
ltskosprefLabelgt
ltskosaltLabelgt
httpservicesld4lorgld4l_servicesloc_name_batchjspquery=ezra20cornellampmaxRecords=10
One full setup per authority
UI-QA-Authority
QA ndash normalize RDF returned from an authority
httplocalhost3000qasearchlinked_data
oclc_fastpersonal_nameq=twainampmaximumRecords=2
[urihttpidworldcatorgfast31622id31622
labelTwain Mark 1835-1910
urihttpidworldcatorgfast365563id365563
labelTwain Shania
httpexperimentalworldcatorgfastsearchquery=o
clcpersonalName+22twain22
ampsortKeys=usageampmaximumRecords=2
RDF of
search
results
Active-Triples
LDF Cache
(Marmotta or
Blazegraph) LDF Cache Jena-Fuseki-
Lucene
Cache
Direct Access
of External
Authority
HyraxVitrolib ndash UI for selecting an entry from an authority
search of cache performed via Lucene index
Third Set of Challenges
8 Disambiguation through better context eg expand from just prefLabel tohellip
preLabel altLabel birthdeath dates occupation etc
9 Reconciliation across multiple sources eg match LoC URI to OCLC FAST URI
53
Whatrsquos next
Addressing Architectural Challenges
bull Generalize process for accessing context on the
cache server and in the normalization layer
bull Multi-authority search and reconciliation
bull Address the need for cache refresh
bull Mirrored cache servers
User Experience and Design
bull User-centered Design
bull Observe listen learn design evaluate iterate
bull Iteratively design and evaluate UI for lookupauthorities
with catalogers
bull Search result rankingorderingfiltering for catalogers
bull Additional UI platforms eg FOLIO
56
Questions
httptinyurlcomld4l-auth-access
Appendix for Challenges 1-4
Challenge 1 Documentation
58
LoC httpidlocgovtechcenter
C Harlow notes on reconciling LoC - httpsgithubcomcmh2166lc-reconcile
OCLC FAST
httpswwwoclcorgdeveloperdevelopweb-servicesfast-apilinked-dataenhtml
GeoNames
httpwwwgeonamesorgexportgeonames-searchhtml
AGROVOC httpaimsfaoorgvest-registryvocabulariesagrovoc-multilingual-agricultural-thesaurus
swagger config httpsgithubcomNatLibFiSkosmosblobmasterswaggerjson
NALT
httpsagclassnalusdagov
DBpedia httpwikidbpediaorgOnlineAccess1220Public20Faceted20Web20Service20Inter
face
Challenge 2 Linked Data Access API
59
for Search Query for Term Fetch
LoC not supported URI
OCLC FAST httpexperimentalworldcatorgfastsearchq
uery=subauth+all+22query22ampsortK
eys=usageampmaximumRecords=maximumR
ecords
URI
GeoNames httpapigeonamesorgsearchq=queryamp
maxRows=maxRowsampusername=userna
meamptype=rdf
URI
AGROVOC httpartemideartuniroma2it8081agrovocr
estv1searchquery=queryamplang=lang
httpartemideartuniroma2it8081agrovo
crestv1datauri=httpaimsfaoorgaosa
grovocterm_id
NALT httpskosmoslibrarycornelledurestv1nalt
searchquery=queryamplang=lang
httpskosmoslibrarycornelledurestv1na
ltdatauri=term_uri
DBpedia
Challenge 3 Varying Results Formats
60
for Search Query for Term Fetch
LoC not supported rdf-xml
OCLC FAST rdf-xml rdf-xml
GeoNames rdf-xml rdf-xml
AGROVOC json-ld rdf-xml json-ld turtle
NALT json-ld rdf-xml json-ld turtle
DBpedia
Challenge 4 Varying Ontologies
61
Primary Ontology Flat vs Navigation
required
LoC madsrdf
SKOS
navigation required
OCLC FAST schemaorg
SKOS
flat
GeoNames geonames flat
hierarchical
AGROVOC SKOS flat
hierarchical
NALT SKOS flat
hierarchical
DBpedia dbpedia flat
Configurations for Questioning Authority
62
LoC httpsgithubcomld4l-
labslinked_data_authoritiestreemasterqa_locconfigauthoritieslinked_dat
a
OCLC FAST httpsgithubcomld4l-
labslinked_data_authoritiestreemasterqa_oclcfastconfigauthoritieslinked
_data
GeoNames httpsgithubcomld4l-
labslinked_data_authoritiestreemasterqa_geonamesconfigauthoritieslink
ed_data
AGROVOC httpsgithubcomld4l-
labslinked_data_authoritiestreemasterqa_agrovocconfigauthoritieslinked
_data
NALT httpsgithubcomld4l-
labslinked_data_authoritiestreemasterqa_naltconfigauthoritieslinked_dat
a
DBpedia httpsgithubcomld4l-
labslinked_data_authoritiestreemasterqa_dbpediaconfigauthoritieslinked
_data
Appendix for Challenges 5-7
Creating a Cache Server
Hardware
bull 8-core 64gb 3Ghz Mac Pro (late 2013) macOS Sierra
(10126)
bull 32tb Pegasus-2 Thunderbolt RAID configured as RAID-5
Triplestore
bull Apache Jena Fuseki 240 provides SPARQL endpoint
bull Apache Tomcat 90 runs custom web application(s)
bull Apache Lucene 36 provides search interface
64
Customizations
bull custom per-data-source JSP web application provides
searchbrowsedownload functionality
bull custom (generic) SPARQL Tag Library provides API for web
apps (available at httpsgithubcomeichmannlod-utilities)
bull custom (generic) Lucene Tag Library provides API for web apps
65
Loading a New Vocabulary
bull download RDF
bull if necessary convert to n-triples (required for GeoNames data for instance)
bull use tdbloader2 to populated triplestore
bull configure Fuseki server(s) with triplestore details
bull create new JSP project in Eclipse
bull write one or more indexer programs that populate Lucene indices and run indexer(s)
bull write searchbrowsedownload application logic using the SPARQL and Lucene tags
bull package project as war
bull deploy to Apache Tomcat server(s)
bull add new service to Apache HTTPD virtual host specification
66
UI Access to Cache Server
httpservicesld4lorgld4l_servicesloc_namejsp
Downloads
68
LoC httpidlocgovdownload (n-triples OR rdf-xml)
OCLC FAST httpwwwoclcorgresearchthemesdata-sciencefastdownloadhtml (n-triples)
GeoNames httpwwwgeonamesorgontologydocumentationhtml (custom format ndash see notes for processing)
AGROVOC httpsaims-faoatlassiannetwikispacesAGVpages2949126Releases (n-triples OR rdf-xml)
NALT httpsagclassnalusdagovdownloadshtml (rdf-xml)
DBpedia httpwikidbpediaorgdownloads-2016-04
Potential Options for Reconciliation
bull VIAF for name reconciliation ndash we are doing some
work with this
bull Wikidata ndash Ive heard that they are working on
Reconciliation issues but havent yet explored in
depth bull Intro Video (3hrs)
bull API Access
bull SPARQL ndash user manual
bull federated queries with other authorities
Doing a google search for linked data reconciliation
returns a large number of articles and presentations
on this concept
Links to Code amp More
bull QA Server - Code for a small app that provides the
Questioning Authority normalization layer
bull Linked Data Authorities - Configurations that can
be used with QA Server
bull LD4L Services - UI access to Cache Server
bull VitroLib - Code for the VitroLib cataloging tool
Selected String and URI
Saves both string and URI
Selecting a Term using
Lookup with Context
26
Selecting a Term using
Lookup with Context
27
Getting more from the same authority
Getting more from other authorities
30
Architecture
Technical Motivation
bull Linked data provideshellip
bull URIs that identify specific terms (as opposed to ambiguity of using
strings)
bull Reconciliation to relate terms that are defined in separate authorities
bull Goals of implementationhellip
bull Provide a single process to access many authorities
bull Provide efficient and reliable access to authorities
bull Provide a means for disambiguation that empowers library staff to
make the most accurate selections
First Set of Challenges
1 Finding Documentation
2 Linked Data Access API eg no support partial support requires login credentials
sparql query endpoint only
3 Varying Results Formats eg rdf-xml json-ld turtle n-triples etc
4 Varying Ontologies eg SKOS schemaorg madsrdf dbpedia geonames
Multi-Server Architecture
QA ndash normalize RDF returned from an authority
Multi-Server Architecture
QA ndash normalize RDF returned from an authority
Direct Access
of External
Authority
HyraxVitrolib ndash UI for selecting an entry from an authority
Multi-Server Architecture
QA ndash normalize RDF returned from an authority
httplocalhost3000qasearchlinked_data
oclc_fastpersonal_nameq=twainamp
maximumRecords=2
Direct Access
of External
Authority
HyraxVitrolib ndash UI for selecting an entry from an authority
Multi-Server Architecture
QA ndash normalize RDF returned from an authority
httplocalhost3000qasearchlinked_data
oclc_fastpersonal_nameq=twainamp
maximumRecords=2
httpexperimentalworldcatorgfast
searchquery=oclcpersonalName+22twain22
ampsortKeys=usageampmaximumRecords=2
Direct Access
of External
Authority
HyraxVitrolib ndash UI for selecting an entry from an authority
Multi-Server Architecture
QA ndash normalize RDF returned from an authority
httplocalhost3000qasearchlinked_data
oclc_fastpersonal_nameq=twainamp
maximumRecords=2
httpexperimentalworldcatorgfast
searchquery=oclcpersonalName+22twain22
ampsortKeys=usageampmaximumRecords=2
lthttpidworldcatorgfast31622gt
a schemaPerson
dctermsidentifier 31622
skosprefLabel Twain Mark 1835-1910
skosaltLabel Make Teviin 1835-1910
Make Tuwen 1835-1910
lthttpidworldcatorgfast365563gt
a schemaPerson
dctermsidentifier 365563
skosprefLabel Twain Shania
skosaltLabel Twain Eilleen
Edwards Eilleen
Direct Access
of External
Authority
HyraxVitrolib ndash UI for selecting an entry from an authority
Multi-Server Architecture
QA ndash normalize RDF returned from an authority
httplocalhost3000qasearchlinked_data
oclc_fastpersonal_nameq=twainamp
maximumRecords=2
[urihttpidworldcatorgfast31622
id31622 labelTwain Mark 1835-1910
urihttpidworldcatorgfast365563
id365563labelTwain Shania ]
httpexperimentalworldcatorgfast
searchquery=oclcpersonalName+22twain22
ampsortKeys=usageampmaximumRecords=2
lthttpidworldcatorgfast31622gt
a schemaPerson
dctermsidentifier 31622
skosprefLabel Twain Mark 1835-1910
skosaltLabel Make Teviin 1835-1910
Make Tuwen 1835-1910
lthttpidworldcatorgfast365563gt
a schemaPerson
dctermsidentifier 365563
skosprefLabel Twain Shania
skosaltLabel Twain Eilleen
Edwards Eilleen
Direct Access
of External
Authority
HyraxVitrolib ndash UI for selecting an entry from an authority
Direct Access Query API
Direct against authorityhellip
httpexperimentalworldcatorgfastsearch
query=oclcpersonalName+22twain22
ampmaximumRecords=2
httpapigeonamesorgsearch
q=ithaca
ampmaxRows=2
ampusername=demo
amptype=rdf
httpartemideartuniroma2it8081agrovocrestv1search
query=milk
amplang=en
ampmaxhits=2
Normalized Query API
Through QA normalization layerhellip
httplocalhost3000qasearchlinked_dataoclc_fast
q=twain
ampmaxRecords=2
httplocalhost3000qasearchlinked_datageonames
q=ithaca
ampmaxRecords=2
httplocalhost3000qasearchlinked_dataagrovoc
q=milk
ampmaxRecords=2
amplang=en
Normalized Results
[urihttpidworldcatorgfast31622 id31622 labelTwain Mark 1835-1910 urihttpidworldcatorgfast365563 id365563 labelTwain Shania]
[uri httpswsgeonamesorg2162552 id httpswsgeonamesorg2162552 label Ithaca (AU) uri httpswsgeonamesorg4515289 id httpswsgeonamesorg4515289 label Ithaca (US)]
[uri httpaimsfaoorgaosagrovocc_8602 id httpaimsfaoorgaosagrovocc_8602 label acidophilus milk uri httpaimsfaoorgaosagrovocc_16076 id httpaimsfaoorgaosagrovocc_16076 label buffalo milk]
OCLC FAST GeoNames AgroVoc
Second Set of Challenges
5 Reliability amp Efficiency eg server uptime server load
6 Accuracy eg select results based on usage data lexical match
custom weighting other
7 Order Ranking eg How to order a graph
Cache Server Query Process
JSP Query API
Jena-Fuseki
Triplestore
One full setup per authority
LuceneSOLR
Index
Cache Server Query Process
JSP Query API
httpservicesld4lorgld4l_servicesloc_name_batchjspquery=ezra20cornellampmaxRecords=10
Jena-Fuseki
Triplestore
LuceneSOLR
Index
One full setup per authority
Cache Server Query Process
JSP Query API
lucene search for ezra cornell
index built with predicate values
ltskosprefLabelgt
ltskosaltLabelgt
Jena-Fuseki
Triplestore
LuceneSOLR
Index
httpservicesld4lorgld4l_servicesloc_name_batchjspquery=ezra20cornellampmaxRecords=10
One full setup per authority
Cache Server Query Process
JSP Query API
extract search rank
extract URI
Jena-Fuseki
Triplestore
for each result
LuceneSOLR
Index
lucene search for ezra cornell
index built with predicate values
ltskosprefLabelgt
ltskosaltLabelgt
httpservicesld4lorgld4l_servicesloc_name_batchjspquery=ezra20cornellampmaxRecords=10
One full setup per authority
Cache Server Query Process
JSP Query API
sparql query for URI
Jena-Fuseki
Triplestore
LuceneSOLR
Index
extract search rank
extract URI
for each result
lucene search for ezra cornell
index built with predicate values
ltskosprefLabelgt
ltskosaltLabelgt
httpservicesld4lorgld4l_servicesloc_name_batchjspquery=ezra20cornellampmaxRecords=10
One full setup per authority
Cache Server Query Process
JSP Query API
combine all results
Jena-Fuseki
Triplestore
insert search rank in predicate
lthttpvivoweborgontology
corerankgt
LuceneSOLR
Index
sparql query for URI extract search rank
extract URI
for each result
lucene search for ezra cornell
index built with predicate values
ltskosprefLabelgt
ltskosaltLabelgt
httpservicesld4lorgld4l_servicesloc_name_batchjspquery=ezra20cornellampmaxRecords=10
One full setup per authority
UI-QA-Authority
QA ndash normalize RDF returned from an authority
httplocalhost3000qasearchlinked_data
oclc_fastpersonal_nameq=twainampmaximumRecords=2
[urihttpidworldcatorgfast31622id31622
labelTwain Mark 1835-1910
urihttpidworldcatorgfast365563id365563
labelTwain Shania
httpexperimentalworldcatorgfastsearchquery=o
clcpersonalName+22twain22
ampsortKeys=usageampmaximumRecords=2
RDF of
search
results
Active-Triples
LDF Cache
(Marmotta or
Blazegraph) LDF Cache Jena-Fuseki-
Lucene
Cache
Direct Access
of External
Authority
HyraxVitrolib ndash UI for selecting an entry from an authority
search of cache performed via Lucene index
Third Set of Challenges
8 Disambiguation through better context eg expand from just prefLabel tohellip
preLabel altLabel birthdeath dates occupation etc
9 Reconciliation across multiple sources eg match LoC URI to OCLC FAST URI
53
Whatrsquos next
Addressing Architectural Challenges
bull Generalize process for accessing context on the
cache server and in the normalization layer
bull Multi-authority search and reconciliation
bull Address the need for cache refresh
bull Mirrored cache servers
User Experience and Design
bull User-centered Design
bull Observe listen learn design evaluate iterate
bull Iteratively design and evaluate UI for lookupauthorities
with catalogers
bull Search result rankingorderingfiltering for catalogers
bull Additional UI platforms eg FOLIO
56
Questions
httptinyurlcomld4l-auth-access
Appendix for Challenges 1-4
Challenge 1 Documentation
58
LoC httpidlocgovtechcenter
C Harlow notes on reconciling LoC - httpsgithubcomcmh2166lc-reconcile
OCLC FAST
httpswwwoclcorgdeveloperdevelopweb-servicesfast-apilinked-dataenhtml
GeoNames
httpwwwgeonamesorgexportgeonames-searchhtml
AGROVOC httpaimsfaoorgvest-registryvocabulariesagrovoc-multilingual-agricultural-thesaurus
swagger config httpsgithubcomNatLibFiSkosmosblobmasterswaggerjson
NALT
httpsagclassnalusdagov
DBpedia httpwikidbpediaorgOnlineAccess1220Public20Faceted20Web20Service20Inter
face
Challenge 2 Linked Data Access API
59
for Search Query for Term Fetch
LoC not supported URI
OCLC FAST httpexperimentalworldcatorgfastsearchq
uery=subauth+all+22query22ampsortK
eys=usageampmaximumRecords=maximumR
ecords
URI
GeoNames httpapigeonamesorgsearchq=queryamp
maxRows=maxRowsampusername=userna
meamptype=rdf
URI
AGROVOC httpartemideartuniroma2it8081agrovocr
estv1searchquery=queryamplang=lang
httpartemideartuniroma2it8081agrovo
crestv1datauri=httpaimsfaoorgaosa
grovocterm_id
NALT httpskosmoslibrarycornelledurestv1nalt
searchquery=queryamplang=lang
httpskosmoslibrarycornelledurestv1na
ltdatauri=term_uri
DBpedia
Challenge 3 Varying Results Formats
60
for Search Query for Term Fetch
LoC not supported rdf-xml
OCLC FAST rdf-xml rdf-xml
GeoNames rdf-xml rdf-xml
AGROVOC json-ld rdf-xml json-ld turtle
NALT json-ld rdf-xml json-ld turtle
DBpedia
Challenge 4 Varying Ontologies
61
Primary Ontology Flat vs Navigation
required
LoC madsrdf
SKOS
navigation required
OCLC FAST schemaorg
SKOS
flat
GeoNames geonames flat
hierarchical
AGROVOC SKOS flat
hierarchical
NALT SKOS flat
hierarchical
DBpedia dbpedia flat
Configurations for Questioning Authority
62
LoC httpsgithubcomld4l-
labslinked_data_authoritiestreemasterqa_locconfigauthoritieslinked_dat
a
OCLC FAST httpsgithubcomld4l-
labslinked_data_authoritiestreemasterqa_oclcfastconfigauthoritieslinked
_data
GeoNames httpsgithubcomld4l-
labslinked_data_authoritiestreemasterqa_geonamesconfigauthoritieslink
ed_data
AGROVOC httpsgithubcomld4l-
labslinked_data_authoritiestreemasterqa_agrovocconfigauthoritieslinked
_data
NALT httpsgithubcomld4l-
labslinked_data_authoritiestreemasterqa_naltconfigauthoritieslinked_dat
a
DBpedia httpsgithubcomld4l-
labslinked_data_authoritiestreemasterqa_dbpediaconfigauthoritieslinked
_data
Appendix for Challenges 5-7
Creating a Cache Server
Hardware
bull 8-core 64gb 3Ghz Mac Pro (late 2013) macOS Sierra
(10126)
bull 32tb Pegasus-2 Thunderbolt RAID configured as RAID-5
Triplestore
bull Apache Jena Fuseki 240 provides SPARQL endpoint
bull Apache Tomcat 90 runs custom web application(s)
bull Apache Lucene 36 provides search interface
64
Customizations
bull custom per-data-source JSP web application provides
searchbrowsedownload functionality
bull custom (generic) SPARQL Tag Library provides API for web
apps (available at httpsgithubcomeichmannlod-utilities)
bull custom (generic) Lucene Tag Library provides API for web apps
65
Loading a New Vocabulary
bull download RDF
bull if necessary convert to n-triples (required for GeoNames data for instance)
bull use tdbloader2 to populated triplestore
bull configure Fuseki server(s) with triplestore details
bull create new JSP project in Eclipse
bull write one or more indexer programs that populate Lucene indices and run indexer(s)
bull write searchbrowsedownload application logic using the SPARQL and Lucene tags
bull package project as war
bull deploy to Apache Tomcat server(s)
bull add new service to Apache HTTPD virtual host specification
66
UI Access to Cache Server
httpservicesld4lorgld4l_servicesloc_namejsp
Downloads
68
LoC httpidlocgovdownload (n-triples OR rdf-xml)
OCLC FAST httpwwwoclcorgresearchthemesdata-sciencefastdownloadhtml (n-triples)
GeoNames httpwwwgeonamesorgontologydocumentationhtml (custom format ndash see notes for processing)
AGROVOC httpsaims-faoatlassiannetwikispacesAGVpages2949126Releases (n-triples OR rdf-xml)
NALT httpsagclassnalusdagovdownloadshtml (rdf-xml)
DBpedia httpwikidbpediaorgdownloads-2016-04
Potential Options for Reconciliation
bull VIAF for name reconciliation ndash we are doing some
work with this
bull Wikidata ndash Ive heard that they are working on
Reconciliation issues but havent yet explored in
depth bull Intro Video (3hrs)
bull API Access
bull SPARQL ndash user manual
bull federated queries with other authorities
Doing a google search for linked data reconciliation
returns a large number of articles and presentations
on this concept
Links to Code amp More
bull QA Server - Code for a small app that provides the
Questioning Authority normalization layer
bull Linked Data Authorities - Configurations that can
be used with QA Server
bull LD4L Services - UI access to Cache Server
bull VitroLib - Code for the VitroLib cataloging tool
Selecting a Term using
Lookup with Context
26
Selecting a Term using
Lookup with Context
27
Getting more from the same authority
Getting more from other authorities
30
Architecture
Technical Motivation
bull Linked data provideshellip
bull URIs that identify specific terms (as opposed to ambiguity of using
strings)
bull Reconciliation to relate terms that are defined in separate authorities
bull Goals of implementationhellip
bull Provide a single process to access many authorities
bull Provide efficient and reliable access to authorities
bull Provide a means for disambiguation that empowers library staff to
make the most accurate selections
First Set of Challenges
1 Finding Documentation
2 Linked Data Access API eg no support partial support requires login credentials
sparql query endpoint only
3 Varying Results Formats eg rdf-xml json-ld turtle n-triples etc
4 Varying Ontologies eg SKOS schemaorg madsrdf dbpedia geonames
Multi-Server Architecture
QA ndash normalize RDF returned from an authority
Multi-Server Architecture
QA ndash normalize RDF returned from an authority
Direct Access
of External
Authority
HyraxVitrolib ndash UI for selecting an entry from an authority
Multi-Server Architecture
QA ndash normalize RDF returned from an authority
httplocalhost3000qasearchlinked_data
oclc_fastpersonal_nameq=twainamp
maximumRecords=2
Direct Access
of External
Authority
HyraxVitrolib ndash UI for selecting an entry from an authority
Multi-Server Architecture
QA ndash normalize RDF returned from an authority
httplocalhost3000qasearchlinked_data
oclc_fastpersonal_nameq=twainamp
maximumRecords=2
httpexperimentalworldcatorgfast
searchquery=oclcpersonalName+22twain22
ampsortKeys=usageampmaximumRecords=2
Direct Access
of External
Authority
HyraxVitrolib ndash UI for selecting an entry from an authority
Multi-Server Architecture
QA ndash normalize RDF returned from an authority
httplocalhost3000qasearchlinked_data
oclc_fastpersonal_nameq=twainamp
maximumRecords=2
httpexperimentalworldcatorgfast
searchquery=oclcpersonalName+22twain22
ampsortKeys=usageampmaximumRecords=2
lthttpidworldcatorgfast31622gt
a schemaPerson
dctermsidentifier 31622
skosprefLabel Twain Mark 1835-1910
skosaltLabel Make Teviin 1835-1910
Make Tuwen 1835-1910
lthttpidworldcatorgfast365563gt
a schemaPerson
dctermsidentifier 365563
skosprefLabel Twain Shania
skosaltLabel Twain Eilleen
Edwards Eilleen
Direct Access
of External
Authority
HyraxVitrolib ndash UI for selecting an entry from an authority
Multi-Server Architecture
QA ndash normalize RDF returned from an authority
httplocalhost3000qasearchlinked_data
oclc_fastpersonal_nameq=twainamp
maximumRecords=2
[urihttpidworldcatorgfast31622
id31622 labelTwain Mark 1835-1910
urihttpidworldcatorgfast365563
id365563labelTwain Shania ]
httpexperimentalworldcatorgfast
searchquery=oclcpersonalName+22twain22
ampsortKeys=usageampmaximumRecords=2
lthttpidworldcatorgfast31622gt
a schemaPerson
dctermsidentifier 31622
skosprefLabel Twain Mark 1835-1910
skosaltLabel Make Teviin 1835-1910
Make Tuwen 1835-1910
lthttpidworldcatorgfast365563gt
a schemaPerson
dctermsidentifier 365563
skosprefLabel Twain Shania
skosaltLabel Twain Eilleen
Edwards Eilleen
Direct Access
of External
Authority
HyraxVitrolib ndash UI for selecting an entry from an authority
Direct Access Query API
Direct against authorityhellip
httpexperimentalworldcatorgfastsearch
query=oclcpersonalName+22twain22
ampmaximumRecords=2
httpapigeonamesorgsearch
q=ithaca
ampmaxRows=2
ampusername=demo
amptype=rdf
httpartemideartuniroma2it8081agrovocrestv1search
query=milk
amplang=en
ampmaxhits=2
Normalized Query API
Through QA normalization layerhellip
httplocalhost3000qasearchlinked_dataoclc_fast
q=twain
ampmaxRecords=2
httplocalhost3000qasearchlinked_datageonames
q=ithaca
ampmaxRecords=2
httplocalhost3000qasearchlinked_dataagrovoc
q=milk
ampmaxRecords=2
amplang=en
Normalized Results
[urihttpidworldcatorgfast31622 id31622 labelTwain Mark 1835-1910 urihttpidworldcatorgfast365563 id365563 labelTwain Shania]
[uri httpswsgeonamesorg2162552 id httpswsgeonamesorg2162552 label Ithaca (AU) uri httpswsgeonamesorg4515289 id httpswsgeonamesorg4515289 label Ithaca (US)]
[uri httpaimsfaoorgaosagrovocc_8602 id httpaimsfaoorgaosagrovocc_8602 label acidophilus milk uri httpaimsfaoorgaosagrovocc_16076 id httpaimsfaoorgaosagrovocc_16076 label buffalo milk]
OCLC FAST GeoNames AgroVoc
Second Set of Challenges
5 Reliability amp Efficiency eg server uptime server load
6 Accuracy eg select results based on usage data lexical match
custom weighting other
7 Order Ranking eg How to order a graph
Cache Server Query Process
JSP Query API
Jena-Fuseki
Triplestore
One full setup per authority
LuceneSOLR
Index
Cache Server Query Process
JSP Query API
httpservicesld4lorgld4l_servicesloc_name_batchjspquery=ezra20cornellampmaxRecords=10
Jena-Fuseki
Triplestore
LuceneSOLR
Index
One full setup per authority
Cache Server Query Process
JSP Query API
lucene search for ezra cornell
index built with predicate values
ltskosprefLabelgt
ltskosaltLabelgt
Jena-Fuseki
Triplestore
LuceneSOLR
Index
httpservicesld4lorgld4l_servicesloc_name_batchjspquery=ezra20cornellampmaxRecords=10
One full setup per authority
Cache Server Query Process
JSP Query API
extract search rank
extract URI
Jena-Fuseki
Triplestore
for each result
LuceneSOLR
Index
lucene search for ezra cornell
index built with predicate values
ltskosprefLabelgt
ltskosaltLabelgt
httpservicesld4lorgld4l_servicesloc_name_batchjspquery=ezra20cornellampmaxRecords=10
One full setup per authority
Cache Server Query Process
JSP Query API
sparql query for URI
Jena-Fuseki
Triplestore
LuceneSOLR
Index
extract search rank
extract URI
for each result
lucene search for ezra cornell
index built with predicate values
ltskosprefLabelgt
ltskosaltLabelgt
httpservicesld4lorgld4l_servicesloc_name_batchjspquery=ezra20cornellampmaxRecords=10
One full setup per authority
Cache Server Query Process
JSP Query API
combine all results
Jena-Fuseki
Triplestore
insert search rank in predicate
lthttpvivoweborgontology
corerankgt
LuceneSOLR
Index
sparql query for URI extract search rank
extract URI
for each result
lucene search for ezra cornell
index built with predicate values
ltskosprefLabelgt
ltskosaltLabelgt
httpservicesld4lorgld4l_servicesloc_name_batchjspquery=ezra20cornellampmaxRecords=10
One full setup per authority
UI-QA-Authority
QA ndash normalize RDF returned from an authority
httplocalhost3000qasearchlinked_data
oclc_fastpersonal_nameq=twainampmaximumRecords=2
[urihttpidworldcatorgfast31622id31622
labelTwain Mark 1835-1910
urihttpidworldcatorgfast365563id365563
labelTwain Shania
httpexperimentalworldcatorgfastsearchquery=o
clcpersonalName+22twain22
ampsortKeys=usageampmaximumRecords=2
RDF of
search
results
Active-Triples
LDF Cache
(Marmotta or
Blazegraph) LDF Cache Jena-Fuseki-
Lucene
Cache
Direct Access
of External
Authority
HyraxVitrolib ndash UI for selecting an entry from an authority
search of cache performed via Lucene index
Third Set of Challenges
8 Disambiguation through better context eg expand from just prefLabel tohellip
preLabel altLabel birthdeath dates occupation etc
9 Reconciliation across multiple sources eg match LoC URI to OCLC FAST URI
53
Whatrsquos next
Addressing Architectural Challenges
bull Generalize process for accessing context on the
cache server and in the normalization layer
bull Multi-authority search and reconciliation
bull Address the need for cache refresh
bull Mirrored cache servers
User Experience and Design
bull User-centered Design
bull Observe listen learn design evaluate iterate
bull Iteratively design and evaluate UI for lookupauthorities
with catalogers
bull Search result rankingorderingfiltering for catalogers
bull Additional UI platforms eg FOLIO
56
Questions
httptinyurlcomld4l-auth-access
Appendix for Challenges 1-4
Challenge 1 Documentation
58
LoC httpidlocgovtechcenter
C Harlow notes on reconciling LoC - httpsgithubcomcmh2166lc-reconcile
OCLC FAST
httpswwwoclcorgdeveloperdevelopweb-servicesfast-apilinked-dataenhtml
GeoNames
httpwwwgeonamesorgexportgeonames-searchhtml
AGROVOC httpaimsfaoorgvest-registryvocabulariesagrovoc-multilingual-agricultural-thesaurus
swagger config httpsgithubcomNatLibFiSkosmosblobmasterswaggerjson
NALT
httpsagclassnalusdagov
DBpedia httpwikidbpediaorgOnlineAccess1220Public20Faceted20Web20Service20Inter
face
Challenge 2 Linked Data Access API
59
for Search Query for Term Fetch
LoC not supported URI
OCLC FAST httpexperimentalworldcatorgfastsearchq
uery=subauth+all+22query22ampsortK
eys=usageampmaximumRecords=maximumR
ecords
URI
GeoNames httpapigeonamesorgsearchq=queryamp
maxRows=maxRowsampusername=userna
meamptype=rdf
URI
AGROVOC httpartemideartuniroma2it8081agrovocr
estv1searchquery=queryamplang=lang
httpartemideartuniroma2it8081agrovo
crestv1datauri=httpaimsfaoorgaosa
grovocterm_id
NALT httpskosmoslibrarycornelledurestv1nalt
searchquery=queryamplang=lang
httpskosmoslibrarycornelledurestv1na
ltdatauri=term_uri
DBpedia
Challenge 3 Varying Results Formats
60
for Search Query for Term Fetch
LoC not supported rdf-xml
OCLC FAST rdf-xml rdf-xml
GeoNames rdf-xml rdf-xml
AGROVOC json-ld rdf-xml json-ld turtle
NALT json-ld rdf-xml json-ld turtle
DBpedia
Challenge 4 Varying Ontologies
61
Primary Ontology Flat vs Navigation
required
LoC madsrdf
SKOS
navigation required
OCLC FAST schemaorg
SKOS
flat
GeoNames geonames flat
hierarchical
AGROVOC SKOS flat
hierarchical
NALT SKOS flat
hierarchical
DBpedia dbpedia flat
Configurations for Questioning Authority
62
LoC httpsgithubcomld4l-
labslinked_data_authoritiestreemasterqa_locconfigauthoritieslinked_dat
a
OCLC FAST httpsgithubcomld4l-
labslinked_data_authoritiestreemasterqa_oclcfastconfigauthoritieslinked
_data
GeoNames httpsgithubcomld4l-
labslinked_data_authoritiestreemasterqa_geonamesconfigauthoritieslink
ed_data
AGROVOC httpsgithubcomld4l-
labslinked_data_authoritiestreemasterqa_agrovocconfigauthoritieslinked
_data
NALT httpsgithubcomld4l-
labslinked_data_authoritiestreemasterqa_naltconfigauthoritieslinked_dat
a
DBpedia httpsgithubcomld4l-
labslinked_data_authoritiestreemasterqa_dbpediaconfigauthoritieslinked
_data
Appendix for Challenges 5-7
Creating a Cache Server
Hardware
bull 8-core 64gb 3Ghz Mac Pro (late 2013) macOS Sierra
(10126)
bull 32tb Pegasus-2 Thunderbolt RAID configured as RAID-5
Triplestore
bull Apache Jena Fuseki 240 provides SPARQL endpoint
bull Apache Tomcat 90 runs custom web application(s)
bull Apache Lucene 36 provides search interface
64
Customizations
bull custom per-data-source JSP web application provides
searchbrowsedownload functionality
bull custom (generic) SPARQL Tag Library provides API for web
apps (available at httpsgithubcomeichmannlod-utilities)
bull custom (generic) Lucene Tag Library provides API for web apps
65
Loading a New Vocabulary
bull download RDF
bull if necessary convert to n-triples (required for GeoNames data for instance)
bull use tdbloader2 to populated triplestore
bull configure Fuseki server(s) with triplestore details
bull create new JSP project in Eclipse
bull write one or more indexer programs that populate Lucene indices and run indexer(s)
bull write searchbrowsedownload application logic using the SPARQL and Lucene tags
bull package project as war
bull deploy to Apache Tomcat server(s)
bull add new service to Apache HTTPD virtual host specification
66
UI Access to Cache Server
httpservicesld4lorgld4l_servicesloc_namejsp
Downloads
68
LoC httpidlocgovdownload (n-triples OR rdf-xml)
OCLC FAST httpwwwoclcorgresearchthemesdata-sciencefastdownloadhtml (n-triples)
GeoNames httpwwwgeonamesorgontologydocumentationhtml (custom format ndash see notes for processing)
AGROVOC httpsaims-faoatlassiannetwikispacesAGVpages2949126Releases (n-triples OR rdf-xml)
NALT httpsagclassnalusdagovdownloadshtml (rdf-xml)
DBpedia httpwikidbpediaorgdownloads-2016-04
Potential Options for Reconciliation
bull VIAF for name reconciliation ndash we are doing some
work with this
bull Wikidata ndash Ive heard that they are working on
Reconciliation issues but havent yet explored in
depth bull Intro Video (3hrs)
bull API Access
bull SPARQL ndash user manual
bull federated queries with other authorities
Doing a google search for linked data reconciliation
returns a large number of articles and presentations
on this concept
Links to Code amp More
bull QA Server - Code for a small app that provides the
Questioning Authority normalization layer
bull Linked Data Authorities - Configurations that can
be used with QA Server
bull LD4L Services - UI access to Cache Server
bull VitroLib - Code for the VitroLib cataloging tool
Selecting a Term using
Lookup with Context
27
Getting more from the same authority
Getting more from other authorities
30
Architecture
Technical Motivation
bull Linked data provideshellip
bull URIs that identify specific terms (as opposed to ambiguity of using
strings)
bull Reconciliation to relate terms that are defined in separate authorities
bull Goals of implementationhellip
bull Provide a single process to access many authorities
bull Provide efficient and reliable access to authorities
bull Provide a means for disambiguation that empowers library staff to
make the most accurate selections
First Set of Challenges
1 Finding Documentation
2 Linked Data Access API eg no support partial support requires login credentials
sparql query endpoint only
3 Varying Results Formats eg rdf-xml json-ld turtle n-triples etc
4 Varying Ontologies eg SKOS schemaorg madsrdf dbpedia geonames
Multi-Server Architecture
QA ndash normalize RDF returned from an authority
Multi-Server Architecture
QA ndash normalize RDF returned from an authority
Direct Access
of External
Authority
HyraxVitrolib ndash UI for selecting an entry from an authority
Multi-Server Architecture
QA ndash normalize RDF returned from an authority
httplocalhost3000qasearchlinked_data
oclc_fastpersonal_nameq=twainamp
maximumRecords=2
Direct Access
of External
Authority
HyraxVitrolib ndash UI for selecting an entry from an authority
Multi-Server Architecture
QA ndash normalize RDF returned from an authority
httplocalhost3000qasearchlinked_data
oclc_fastpersonal_nameq=twainamp
maximumRecords=2
httpexperimentalworldcatorgfast
searchquery=oclcpersonalName+22twain22
ampsortKeys=usageampmaximumRecords=2
Direct Access
of External
Authority
HyraxVitrolib ndash UI for selecting an entry from an authority
Multi-Server Architecture
QA ndash normalize RDF returned from an authority
httplocalhost3000qasearchlinked_data
oclc_fastpersonal_nameq=twainamp
maximumRecords=2
httpexperimentalworldcatorgfast
searchquery=oclcpersonalName+22twain22
ampsortKeys=usageampmaximumRecords=2
lthttpidworldcatorgfast31622gt
a schemaPerson
dctermsidentifier 31622
skosprefLabel Twain Mark 1835-1910
skosaltLabel Make Teviin 1835-1910
Make Tuwen 1835-1910
lthttpidworldcatorgfast365563gt
a schemaPerson
dctermsidentifier 365563
skosprefLabel Twain Shania
skosaltLabel Twain Eilleen
Edwards Eilleen
Direct Access
of External
Authority
HyraxVitrolib ndash UI for selecting an entry from an authority
Multi-Server Architecture
QA ndash normalize RDF returned from an authority
httplocalhost3000qasearchlinked_data
oclc_fastpersonal_nameq=twainamp
maximumRecords=2
[urihttpidworldcatorgfast31622
id31622 labelTwain Mark 1835-1910
urihttpidworldcatorgfast365563
id365563labelTwain Shania ]
httpexperimentalworldcatorgfast
searchquery=oclcpersonalName+22twain22
ampsortKeys=usageampmaximumRecords=2
lthttpidworldcatorgfast31622gt
a schemaPerson
dctermsidentifier 31622
skosprefLabel Twain Mark 1835-1910
skosaltLabel Make Teviin 1835-1910
Make Tuwen 1835-1910
lthttpidworldcatorgfast365563gt
a schemaPerson
dctermsidentifier 365563
skosprefLabel Twain Shania
skosaltLabel Twain Eilleen
Edwards Eilleen
Direct Access
of External
Authority
HyraxVitrolib ndash UI for selecting an entry from an authority
Direct Access Query API
Direct against authorityhellip
httpexperimentalworldcatorgfastsearch
query=oclcpersonalName+22twain22
ampmaximumRecords=2
httpapigeonamesorgsearch
q=ithaca
ampmaxRows=2
ampusername=demo
amptype=rdf
httpartemideartuniroma2it8081agrovocrestv1search
query=milk
amplang=en
ampmaxhits=2
Normalized Query API
Through QA normalization layerhellip
httplocalhost3000qasearchlinked_dataoclc_fast
q=twain
ampmaxRecords=2
httplocalhost3000qasearchlinked_datageonames
q=ithaca
ampmaxRecords=2
httplocalhost3000qasearchlinked_dataagrovoc
q=milk
ampmaxRecords=2
amplang=en
Normalized Results
[urihttpidworldcatorgfast31622 id31622 labelTwain Mark 1835-1910 urihttpidworldcatorgfast365563 id365563 labelTwain Shania]
[uri httpswsgeonamesorg2162552 id httpswsgeonamesorg2162552 label Ithaca (AU) uri httpswsgeonamesorg4515289 id httpswsgeonamesorg4515289 label Ithaca (US)]
[uri httpaimsfaoorgaosagrovocc_8602 id httpaimsfaoorgaosagrovocc_8602 label acidophilus milk uri httpaimsfaoorgaosagrovocc_16076 id httpaimsfaoorgaosagrovocc_16076 label buffalo milk]
OCLC FAST GeoNames AgroVoc
Second Set of Challenges
5 Reliability amp Efficiency eg server uptime server load
6 Accuracy eg select results based on usage data lexical match
custom weighting other
7 Order Ranking eg How to order a graph
Cache Server Query Process
JSP Query API
Jena-Fuseki
Triplestore
One full setup per authority
LuceneSOLR
Index
Cache Server Query Process
JSP Query API
httpservicesld4lorgld4l_servicesloc_name_batchjspquery=ezra20cornellampmaxRecords=10
Jena-Fuseki
Triplestore
LuceneSOLR
Index
One full setup per authority
Cache Server Query Process
JSP Query API
lucene search for ezra cornell
index built with predicate values
ltskosprefLabelgt
ltskosaltLabelgt
Jena-Fuseki
Triplestore
LuceneSOLR
Index
httpservicesld4lorgld4l_servicesloc_name_batchjspquery=ezra20cornellampmaxRecords=10
One full setup per authority
Cache Server Query Process
JSP Query API
extract search rank
extract URI
Jena-Fuseki
Triplestore
for each result
LuceneSOLR
Index
lucene search for ezra cornell
index built with predicate values
ltskosprefLabelgt
ltskosaltLabelgt
httpservicesld4lorgld4l_servicesloc_name_batchjspquery=ezra20cornellampmaxRecords=10
One full setup per authority
Cache Server Query Process
JSP Query API
sparql query for URI
Jena-Fuseki
Triplestore
LuceneSOLR
Index
extract search rank
extract URI
for each result
lucene search for ezra cornell
index built with predicate values
ltskosprefLabelgt
ltskosaltLabelgt
httpservicesld4lorgld4l_servicesloc_name_batchjspquery=ezra20cornellampmaxRecords=10
One full setup per authority
Cache Server Query Process
JSP Query API
combine all results
Jena-Fuseki
Triplestore
insert search rank in predicate
lthttpvivoweborgontology
corerankgt
LuceneSOLR
Index
sparql query for URI extract search rank
extract URI
for each result
lucene search for ezra cornell
index built with predicate values
ltskosprefLabelgt
ltskosaltLabelgt
httpservicesld4lorgld4l_servicesloc_name_batchjspquery=ezra20cornellampmaxRecords=10
One full setup per authority
UI-QA-Authority
QA ndash normalize RDF returned from an authority
httplocalhost3000qasearchlinked_data
oclc_fastpersonal_nameq=twainampmaximumRecords=2
[urihttpidworldcatorgfast31622id31622
labelTwain Mark 1835-1910
urihttpidworldcatorgfast365563id365563
labelTwain Shania
httpexperimentalworldcatorgfastsearchquery=o
clcpersonalName+22twain22
ampsortKeys=usageampmaximumRecords=2
RDF of
search
results
Active-Triples
LDF Cache
(Marmotta or
Blazegraph) LDF Cache Jena-Fuseki-
Lucene
Cache
Direct Access
of External
Authority
HyraxVitrolib ndash UI for selecting an entry from an authority
search of cache performed via Lucene index
Third Set of Challenges
8 Disambiguation through better context eg expand from just prefLabel tohellip
preLabel altLabel birthdeath dates occupation etc
9 Reconciliation across multiple sources eg match LoC URI to OCLC FAST URI
53
Whatrsquos next
Addressing Architectural Challenges
bull Generalize process for accessing context on the
cache server and in the normalization layer
bull Multi-authority search and reconciliation
bull Address the need for cache refresh
bull Mirrored cache servers
User Experience and Design
bull User-centered Design
bull Observe listen learn design evaluate iterate
bull Iteratively design and evaluate UI for lookupauthorities
with catalogers
bull Search result rankingorderingfiltering for catalogers
bull Additional UI platforms eg FOLIO
56
Questions
httptinyurlcomld4l-auth-access
Appendix for Challenges 1-4
Challenge 1 Documentation
58
LoC httpidlocgovtechcenter
C Harlow notes on reconciling LoC - httpsgithubcomcmh2166lc-reconcile
OCLC FAST
httpswwwoclcorgdeveloperdevelopweb-servicesfast-apilinked-dataenhtml
GeoNames
httpwwwgeonamesorgexportgeonames-searchhtml
AGROVOC httpaimsfaoorgvest-registryvocabulariesagrovoc-multilingual-agricultural-thesaurus
swagger config httpsgithubcomNatLibFiSkosmosblobmasterswaggerjson
NALT
httpsagclassnalusdagov
DBpedia httpwikidbpediaorgOnlineAccess1220Public20Faceted20Web20Service20Inter
face
Challenge 2 Linked Data Access API
59
for Search Query for Term Fetch
LoC not supported URI
OCLC FAST httpexperimentalworldcatorgfastsearchq
uery=subauth+all+22query22ampsortK
eys=usageampmaximumRecords=maximumR
ecords
URI
GeoNames httpapigeonamesorgsearchq=queryamp
maxRows=maxRowsampusername=userna
meamptype=rdf
URI
AGROVOC httpartemideartuniroma2it8081agrovocr
estv1searchquery=queryamplang=lang
httpartemideartuniroma2it8081agrovo
crestv1datauri=httpaimsfaoorgaosa
grovocterm_id
NALT httpskosmoslibrarycornelledurestv1nalt
searchquery=queryamplang=lang
httpskosmoslibrarycornelledurestv1na
ltdatauri=term_uri
DBpedia
Challenge 3 Varying Results Formats
60
for Search Query for Term Fetch
LoC not supported rdf-xml
OCLC FAST rdf-xml rdf-xml
GeoNames rdf-xml rdf-xml
AGROVOC json-ld rdf-xml json-ld turtle
NALT json-ld rdf-xml json-ld turtle
DBpedia
Challenge 4 Varying Ontologies
61
Primary Ontology Flat vs Navigation
required
LoC madsrdf
SKOS
navigation required
OCLC FAST schemaorg
SKOS
flat
GeoNames geonames flat
hierarchical
AGROVOC SKOS flat
hierarchical
NALT SKOS flat
hierarchical
DBpedia dbpedia flat
Configurations for Questioning Authority
62
LoC httpsgithubcomld4l-
labslinked_data_authoritiestreemasterqa_locconfigauthoritieslinked_dat
a
OCLC FAST httpsgithubcomld4l-
labslinked_data_authoritiestreemasterqa_oclcfastconfigauthoritieslinked
_data
GeoNames httpsgithubcomld4l-
labslinked_data_authoritiestreemasterqa_geonamesconfigauthoritieslink
ed_data
AGROVOC httpsgithubcomld4l-
labslinked_data_authoritiestreemasterqa_agrovocconfigauthoritieslinked
_data
NALT httpsgithubcomld4l-
labslinked_data_authoritiestreemasterqa_naltconfigauthoritieslinked_dat
a
DBpedia httpsgithubcomld4l-
labslinked_data_authoritiestreemasterqa_dbpediaconfigauthoritieslinked
_data
Appendix for Challenges 5-7
Creating a Cache Server
Hardware
bull 8-core 64gb 3Ghz Mac Pro (late 2013) macOS Sierra
(10126)
bull 32tb Pegasus-2 Thunderbolt RAID configured as RAID-5
Triplestore
bull Apache Jena Fuseki 240 provides SPARQL endpoint
bull Apache Tomcat 90 runs custom web application(s)
bull Apache Lucene 36 provides search interface
64
Customizations
bull custom per-data-source JSP web application provides
searchbrowsedownload functionality
bull custom (generic) SPARQL Tag Library provides API for web
apps (available at httpsgithubcomeichmannlod-utilities)
bull custom (generic) Lucene Tag Library provides API for web apps
65
Loading a New Vocabulary
bull download RDF
bull if necessary convert to n-triples (required for GeoNames data for instance)
bull use tdbloader2 to populated triplestore
bull configure Fuseki server(s) with triplestore details
bull create new JSP project in Eclipse
bull write one or more indexer programs that populate Lucene indices and run indexer(s)
bull write searchbrowsedownload application logic using the SPARQL and Lucene tags
bull package project as war
bull deploy to Apache Tomcat server(s)
bull add new service to Apache HTTPD virtual host specification
66
UI Access to Cache Server
httpservicesld4lorgld4l_servicesloc_namejsp
Downloads
68
LoC httpidlocgovdownload (n-triples OR rdf-xml)
OCLC FAST httpwwwoclcorgresearchthemesdata-sciencefastdownloadhtml (n-triples)
GeoNames httpwwwgeonamesorgontologydocumentationhtml (custom format ndash see notes for processing)
AGROVOC httpsaims-faoatlassiannetwikispacesAGVpages2949126Releases (n-triples OR rdf-xml)
NALT httpsagclassnalusdagovdownloadshtml (rdf-xml)
DBpedia httpwikidbpediaorgdownloads-2016-04
Potential Options for Reconciliation
bull VIAF for name reconciliation ndash we are doing some
work with this
bull Wikidata ndash Ive heard that they are working on
Reconciliation issues but havent yet explored in
depth bull Intro Video (3hrs)
bull API Access
bull SPARQL ndash user manual
bull federated queries with other authorities
Doing a google search for linked data reconciliation
returns a large number of articles and presentations
on this concept
Links to Code amp More
bull QA Server - Code for a small app that provides the
Questioning Authority normalization layer
bull Linked Data Authorities - Configurations that can
be used with QA Server
bull LD4L Services - UI access to Cache Server
bull VitroLib - Code for the VitroLib cataloging tool
Getting more from the same authority
Getting more from other authorities
30
Architecture
Technical Motivation
bull Linked data provideshellip
bull URIs that identify specific terms (as opposed to ambiguity of using
strings)
bull Reconciliation to relate terms that are defined in separate authorities
bull Goals of implementationhellip
bull Provide a single process to access many authorities
bull Provide efficient and reliable access to authorities
bull Provide a means for disambiguation that empowers library staff to
make the most accurate selections
First Set of Challenges
1 Finding Documentation
2 Linked Data Access API eg no support partial support requires login credentials
sparql query endpoint only
3 Varying Results Formats eg rdf-xml json-ld turtle n-triples etc
4 Varying Ontologies eg SKOS schemaorg madsrdf dbpedia geonames
Multi-Server Architecture
QA ndash normalize RDF returned from an authority
Multi-Server Architecture
QA ndash normalize RDF returned from an authority
Direct Access
of External
Authority
HyraxVitrolib ndash UI for selecting an entry from an authority
Multi-Server Architecture
QA ndash normalize RDF returned from an authority
httplocalhost3000qasearchlinked_data
oclc_fastpersonal_nameq=twainamp
maximumRecords=2
Direct Access
of External
Authority
HyraxVitrolib ndash UI for selecting an entry from an authority
Multi-Server Architecture
QA ndash normalize RDF returned from an authority
httplocalhost3000qasearchlinked_data
oclc_fastpersonal_nameq=twainamp
maximumRecords=2
httpexperimentalworldcatorgfast
searchquery=oclcpersonalName+22twain22
ampsortKeys=usageampmaximumRecords=2
Direct Access
of External
Authority
HyraxVitrolib ndash UI for selecting an entry from an authority
Multi-Server Architecture
QA ndash normalize RDF returned from an authority
httplocalhost3000qasearchlinked_data
oclc_fastpersonal_nameq=twainamp
maximumRecords=2
httpexperimentalworldcatorgfast
searchquery=oclcpersonalName+22twain22
ampsortKeys=usageampmaximumRecords=2
lthttpidworldcatorgfast31622gt
a schemaPerson
dctermsidentifier 31622
skosprefLabel Twain Mark 1835-1910
skosaltLabel Make Teviin 1835-1910
Make Tuwen 1835-1910
lthttpidworldcatorgfast365563gt
a schemaPerson
dctermsidentifier 365563
skosprefLabel Twain Shania
skosaltLabel Twain Eilleen
Edwards Eilleen
Direct Access
of External
Authority
HyraxVitrolib ndash UI for selecting an entry from an authority
Multi-Server Architecture
QA ndash normalize RDF returned from an authority
httplocalhost3000qasearchlinked_data
oclc_fastpersonal_nameq=twainamp
maximumRecords=2
[urihttpidworldcatorgfast31622
id31622 labelTwain Mark 1835-1910
urihttpidworldcatorgfast365563
id365563labelTwain Shania ]
httpexperimentalworldcatorgfast
searchquery=oclcpersonalName+22twain22
ampsortKeys=usageampmaximumRecords=2
lthttpidworldcatorgfast31622gt
a schemaPerson
dctermsidentifier 31622
skosprefLabel Twain Mark 1835-1910
skosaltLabel Make Teviin 1835-1910
Make Tuwen 1835-1910
lthttpidworldcatorgfast365563gt
a schemaPerson
dctermsidentifier 365563
skosprefLabel Twain Shania
skosaltLabel Twain Eilleen
Edwards Eilleen
Direct Access
of External
Authority
HyraxVitrolib ndash UI for selecting an entry from an authority
Direct Access Query API
Direct against authorityhellip
httpexperimentalworldcatorgfastsearch
query=oclcpersonalName+22twain22
ampmaximumRecords=2
httpapigeonamesorgsearch
q=ithaca
ampmaxRows=2
ampusername=demo
amptype=rdf
httpartemideartuniroma2it8081agrovocrestv1search
query=milk
amplang=en
ampmaxhits=2
Normalized Query API
Through QA normalization layerhellip
httplocalhost3000qasearchlinked_dataoclc_fast
q=twain
ampmaxRecords=2
httplocalhost3000qasearchlinked_datageonames
q=ithaca
ampmaxRecords=2
httplocalhost3000qasearchlinked_dataagrovoc
q=milk
ampmaxRecords=2
amplang=en
Normalized Results
[urihttpidworldcatorgfast31622 id31622 labelTwain Mark 1835-1910 urihttpidworldcatorgfast365563 id365563 labelTwain Shania]
[uri httpswsgeonamesorg2162552 id httpswsgeonamesorg2162552 label Ithaca (AU) uri httpswsgeonamesorg4515289 id httpswsgeonamesorg4515289 label Ithaca (US)]
[uri httpaimsfaoorgaosagrovocc_8602 id httpaimsfaoorgaosagrovocc_8602 label acidophilus milk uri httpaimsfaoorgaosagrovocc_16076 id httpaimsfaoorgaosagrovocc_16076 label buffalo milk]
OCLC FAST GeoNames AgroVoc
Second Set of Challenges
5 Reliability amp Efficiency eg server uptime server load
6 Accuracy eg select results based on usage data lexical match
custom weighting other
7 Order Ranking eg How to order a graph
Cache Server Query Process
JSP Query API
Jena-Fuseki
Triplestore
One full setup per authority
LuceneSOLR
Index
Cache Server Query Process
JSP Query API
httpservicesld4lorgld4l_servicesloc_name_batchjspquery=ezra20cornellampmaxRecords=10
Jena-Fuseki
Triplestore
LuceneSOLR
Index
One full setup per authority
Cache Server Query Process
JSP Query API
lucene search for ezra cornell
index built with predicate values
ltskosprefLabelgt
ltskosaltLabelgt
Jena-Fuseki
Triplestore
LuceneSOLR
Index
httpservicesld4lorgld4l_servicesloc_name_batchjspquery=ezra20cornellampmaxRecords=10
One full setup per authority
Cache Server Query Process
JSP Query API
extract search rank
extract URI
Jena-Fuseki
Triplestore
for each result
LuceneSOLR
Index
lucene search for ezra cornell
index built with predicate values
ltskosprefLabelgt
ltskosaltLabelgt
httpservicesld4lorgld4l_servicesloc_name_batchjspquery=ezra20cornellampmaxRecords=10
One full setup per authority
Cache Server Query Process
JSP Query API
sparql query for URI
Jena-Fuseki
Triplestore
LuceneSOLR
Index
extract search rank
extract URI
for each result
lucene search for ezra cornell
index built with predicate values
ltskosprefLabelgt
ltskosaltLabelgt
httpservicesld4lorgld4l_servicesloc_name_batchjspquery=ezra20cornellampmaxRecords=10
One full setup per authority
Cache Server Query Process
JSP Query API
combine all results
Jena-Fuseki
Triplestore
insert search rank in predicate
lthttpvivoweborgontology
corerankgt
LuceneSOLR
Index
sparql query for URI extract search rank
extract URI
for each result
lucene search for ezra cornell
index built with predicate values
ltskosprefLabelgt
ltskosaltLabelgt
httpservicesld4lorgld4l_servicesloc_name_batchjspquery=ezra20cornellampmaxRecords=10
One full setup per authority
UI-QA-Authority
QA ndash normalize RDF returned from an authority
httplocalhost3000qasearchlinked_data
oclc_fastpersonal_nameq=twainampmaximumRecords=2
[urihttpidworldcatorgfast31622id31622
labelTwain Mark 1835-1910
urihttpidworldcatorgfast365563id365563
labelTwain Shania
httpexperimentalworldcatorgfastsearchquery=o
clcpersonalName+22twain22
ampsortKeys=usageampmaximumRecords=2
RDF of
search
results
Active-Triples
LDF Cache
(Marmotta or
Blazegraph) LDF Cache Jena-Fuseki-
Lucene
Cache
Direct Access
of External
Authority
HyraxVitrolib ndash UI for selecting an entry from an authority
search of cache performed via Lucene index
Third Set of Challenges
8 Disambiguation through better context eg expand from just prefLabel tohellip
preLabel altLabel birthdeath dates occupation etc
9 Reconciliation across multiple sources eg match LoC URI to OCLC FAST URI
53
Whatrsquos next
Addressing Architectural Challenges
bull Generalize process for accessing context on the
cache server and in the normalization layer
bull Multi-authority search and reconciliation
bull Address the need for cache refresh
bull Mirrored cache servers
User Experience and Design
bull User-centered Design
bull Observe listen learn design evaluate iterate
bull Iteratively design and evaluate UI for lookupauthorities
with catalogers
bull Search result rankingorderingfiltering for catalogers
bull Additional UI platforms eg FOLIO
56
Questions
httptinyurlcomld4l-auth-access
Appendix for Challenges 1-4
Challenge 1 Documentation
58
LoC httpidlocgovtechcenter
C Harlow notes on reconciling LoC - httpsgithubcomcmh2166lc-reconcile
OCLC FAST
httpswwwoclcorgdeveloperdevelopweb-servicesfast-apilinked-dataenhtml
GeoNames
httpwwwgeonamesorgexportgeonames-searchhtml
AGROVOC httpaimsfaoorgvest-registryvocabulariesagrovoc-multilingual-agricultural-thesaurus
swagger config httpsgithubcomNatLibFiSkosmosblobmasterswaggerjson
NALT
httpsagclassnalusdagov
DBpedia httpwikidbpediaorgOnlineAccess1220Public20Faceted20Web20Service20Inter
face
Challenge 2 Linked Data Access API
59
for Search Query for Term Fetch
LoC not supported URI
OCLC FAST httpexperimentalworldcatorgfastsearchq
uery=subauth+all+22query22ampsortK
eys=usageampmaximumRecords=maximumR
ecords
URI
GeoNames httpapigeonamesorgsearchq=queryamp
maxRows=maxRowsampusername=userna
meamptype=rdf
URI
AGROVOC httpartemideartuniroma2it8081agrovocr
estv1searchquery=queryamplang=lang
httpartemideartuniroma2it8081agrovo
crestv1datauri=httpaimsfaoorgaosa
grovocterm_id
NALT httpskosmoslibrarycornelledurestv1nalt
searchquery=queryamplang=lang
httpskosmoslibrarycornelledurestv1na
ltdatauri=term_uri
DBpedia
Challenge 3 Varying Results Formats
60
for Search Query for Term Fetch
LoC not supported rdf-xml
OCLC FAST rdf-xml rdf-xml
GeoNames rdf-xml rdf-xml
AGROVOC json-ld rdf-xml json-ld turtle
NALT json-ld rdf-xml json-ld turtle
DBpedia
Challenge 4 Varying Ontologies
61
Primary Ontology Flat vs Navigation
required
LoC madsrdf
SKOS
navigation required
OCLC FAST schemaorg
SKOS
flat
GeoNames geonames flat
hierarchical
AGROVOC SKOS flat
hierarchical
NALT SKOS flat
hierarchical
DBpedia dbpedia flat
Configurations for Questioning Authority
62
LoC httpsgithubcomld4l-
labslinked_data_authoritiestreemasterqa_locconfigauthoritieslinked_dat
a
OCLC FAST httpsgithubcomld4l-
labslinked_data_authoritiestreemasterqa_oclcfastconfigauthoritieslinked
_data
GeoNames httpsgithubcomld4l-
labslinked_data_authoritiestreemasterqa_geonamesconfigauthoritieslink
ed_data
AGROVOC httpsgithubcomld4l-
labslinked_data_authoritiestreemasterqa_agrovocconfigauthoritieslinked
_data
NALT httpsgithubcomld4l-
labslinked_data_authoritiestreemasterqa_naltconfigauthoritieslinked_dat
a
DBpedia httpsgithubcomld4l-
labslinked_data_authoritiestreemasterqa_dbpediaconfigauthoritieslinked
_data
Appendix for Challenges 5-7
Creating a Cache Server
Hardware
bull 8-core 64gb 3Ghz Mac Pro (late 2013) macOS Sierra
(10126)
bull 32tb Pegasus-2 Thunderbolt RAID configured as RAID-5
Triplestore
bull Apache Jena Fuseki 240 provides SPARQL endpoint
bull Apache Tomcat 90 runs custom web application(s)
bull Apache Lucene 36 provides search interface
64
Customizations
bull custom per-data-source JSP web application provides
searchbrowsedownload functionality
bull custom (generic) SPARQL Tag Library provides API for web
apps (available at httpsgithubcomeichmannlod-utilities)
bull custom (generic) Lucene Tag Library provides API for web apps
65
Loading a New Vocabulary
bull download RDF
bull if necessary convert to n-triples (required for GeoNames data for instance)
bull use tdbloader2 to populated triplestore
bull configure Fuseki server(s) with triplestore details
bull create new JSP project in Eclipse
bull write one or more indexer programs that populate Lucene indices and run indexer(s)
bull write searchbrowsedownload application logic using the SPARQL and Lucene tags
bull package project as war
bull deploy to Apache Tomcat server(s)
bull add new service to Apache HTTPD virtual host specification
66
UI Access to Cache Server
httpservicesld4lorgld4l_servicesloc_namejsp
Downloads
68
LoC httpidlocgovdownload (n-triples OR rdf-xml)
OCLC FAST httpwwwoclcorgresearchthemesdata-sciencefastdownloadhtml (n-triples)
GeoNames httpwwwgeonamesorgontologydocumentationhtml (custom format ndash see notes for processing)
AGROVOC httpsaims-faoatlassiannetwikispacesAGVpages2949126Releases (n-triples OR rdf-xml)
NALT httpsagclassnalusdagovdownloadshtml (rdf-xml)
DBpedia httpwikidbpediaorgdownloads-2016-04
Potential Options for Reconciliation
bull VIAF for name reconciliation ndash we are doing some
work with this
bull Wikidata ndash Ive heard that they are working on
Reconciliation issues but havent yet explored in
depth bull Intro Video (3hrs)
bull API Access
bull SPARQL ndash user manual
bull federated queries with other authorities
Doing a google search for linked data reconciliation
returns a large number of articles and presentations
on this concept
Links to Code amp More
bull QA Server - Code for a small app that provides the
Questioning Authority normalization layer
bull Linked Data Authorities - Configurations that can
be used with QA Server
bull LD4L Services - UI access to Cache Server
bull VitroLib - Code for the VitroLib cataloging tool
Getting more from other authorities
30
Architecture
Technical Motivation
bull Linked data provideshellip
bull URIs that identify specific terms (as opposed to ambiguity of using
strings)
bull Reconciliation to relate terms that are defined in separate authorities
bull Goals of implementationhellip
bull Provide a single process to access many authorities
bull Provide efficient and reliable access to authorities
bull Provide a means for disambiguation that empowers library staff to
make the most accurate selections
First Set of Challenges
1 Finding Documentation
2 Linked Data Access API eg no support partial support requires login credentials
sparql query endpoint only
3 Varying Results Formats eg rdf-xml json-ld turtle n-triples etc
4 Varying Ontologies eg SKOS schemaorg madsrdf dbpedia geonames
Multi-Server Architecture
QA ndash normalize RDF returned from an authority
Multi-Server Architecture
QA ndash normalize RDF returned from an authority
Direct Access
of External
Authority
HyraxVitrolib ndash UI for selecting an entry from an authority
Multi-Server Architecture
QA ndash normalize RDF returned from an authority
httplocalhost3000qasearchlinked_data
oclc_fastpersonal_nameq=twainamp
maximumRecords=2
Direct Access
of External
Authority
HyraxVitrolib ndash UI for selecting an entry from an authority
Multi-Server Architecture
QA ndash normalize RDF returned from an authority
httplocalhost3000qasearchlinked_data
oclc_fastpersonal_nameq=twainamp
maximumRecords=2
httpexperimentalworldcatorgfast
searchquery=oclcpersonalName+22twain22
ampsortKeys=usageampmaximumRecords=2
Direct Access
of External
Authority
HyraxVitrolib ndash UI for selecting an entry from an authority
Multi-Server Architecture
QA ndash normalize RDF returned from an authority
httplocalhost3000qasearchlinked_data
oclc_fastpersonal_nameq=twainamp
maximumRecords=2
httpexperimentalworldcatorgfast
searchquery=oclcpersonalName+22twain22
ampsortKeys=usageampmaximumRecords=2
lthttpidworldcatorgfast31622gt
a schemaPerson
dctermsidentifier 31622
skosprefLabel Twain Mark 1835-1910
skosaltLabel Make Teviin 1835-1910
Make Tuwen 1835-1910
lthttpidworldcatorgfast365563gt
a schemaPerson
dctermsidentifier 365563
skosprefLabel Twain Shania
skosaltLabel Twain Eilleen
Edwards Eilleen
Direct Access
of External
Authority
HyraxVitrolib ndash UI for selecting an entry from an authority
Multi-Server Architecture
QA ndash normalize RDF returned from an authority
httplocalhost3000qasearchlinked_data
oclc_fastpersonal_nameq=twainamp
maximumRecords=2
[urihttpidworldcatorgfast31622
id31622 labelTwain Mark 1835-1910
urihttpidworldcatorgfast365563
id365563labelTwain Shania ]
httpexperimentalworldcatorgfast
searchquery=oclcpersonalName+22twain22
ampsortKeys=usageampmaximumRecords=2
lthttpidworldcatorgfast31622gt
a schemaPerson
dctermsidentifier 31622
skosprefLabel Twain Mark 1835-1910
skosaltLabel Make Teviin 1835-1910
Make Tuwen 1835-1910
lthttpidworldcatorgfast365563gt
a schemaPerson
dctermsidentifier 365563
skosprefLabel Twain Shania
skosaltLabel Twain Eilleen
Edwards Eilleen
Direct Access
of External
Authority
HyraxVitrolib ndash UI for selecting an entry from an authority
Direct Access Query API
Direct against authorityhellip
httpexperimentalworldcatorgfastsearch
query=oclcpersonalName+22twain22
ampmaximumRecords=2
httpapigeonamesorgsearch
q=ithaca
ampmaxRows=2
ampusername=demo
amptype=rdf
httpartemideartuniroma2it8081agrovocrestv1search
query=milk
amplang=en
ampmaxhits=2
Normalized Query API
Through QA normalization layerhellip
httplocalhost3000qasearchlinked_dataoclc_fast
q=twain
ampmaxRecords=2
httplocalhost3000qasearchlinked_datageonames
q=ithaca
ampmaxRecords=2
httplocalhost3000qasearchlinked_dataagrovoc
q=milk
ampmaxRecords=2
amplang=en
Normalized Results
[urihttpidworldcatorgfast31622 id31622 labelTwain Mark 1835-1910 urihttpidworldcatorgfast365563 id365563 labelTwain Shania]
[uri httpswsgeonamesorg2162552 id httpswsgeonamesorg2162552 label Ithaca (AU) uri httpswsgeonamesorg4515289 id httpswsgeonamesorg4515289 label Ithaca (US)]
[uri httpaimsfaoorgaosagrovocc_8602 id httpaimsfaoorgaosagrovocc_8602 label acidophilus milk uri httpaimsfaoorgaosagrovocc_16076 id httpaimsfaoorgaosagrovocc_16076 label buffalo milk]
OCLC FAST GeoNames AgroVoc
Second Set of Challenges
5 Reliability amp Efficiency eg server uptime server load
6 Accuracy eg select results based on usage data lexical match
custom weighting other
7 Order Ranking eg How to order a graph
Cache Server Query Process
JSP Query API
Jena-Fuseki
Triplestore
One full setup per authority
LuceneSOLR
Index
Cache Server Query Process
JSP Query API
httpservicesld4lorgld4l_servicesloc_name_batchjspquery=ezra20cornellampmaxRecords=10
Jena-Fuseki
Triplestore
LuceneSOLR
Index
One full setup per authority
Cache Server Query Process
JSP Query API
lucene search for ezra cornell
index built with predicate values
ltskosprefLabelgt
ltskosaltLabelgt
Jena-Fuseki
Triplestore
LuceneSOLR
Index
httpservicesld4lorgld4l_servicesloc_name_batchjspquery=ezra20cornellampmaxRecords=10
One full setup per authority
Cache Server Query Process
JSP Query API
extract search rank
extract URI
Jena-Fuseki
Triplestore
for each result
LuceneSOLR
Index
lucene search for ezra cornell
index built with predicate values
ltskosprefLabelgt
ltskosaltLabelgt
httpservicesld4lorgld4l_servicesloc_name_batchjspquery=ezra20cornellampmaxRecords=10
One full setup per authority
Cache Server Query Process
JSP Query API
sparql query for URI
Jena-Fuseki
Triplestore
LuceneSOLR
Index
extract search rank
extract URI
for each result
lucene search for ezra cornell
index built with predicate values
ltskosprefLabelgt
ltskosaltLabelgt
httpservicesld4lorgld4l_servicesloc_name_batchjspquery=ezra20cornellampmaxRecords=10
One full setup per authority
Cache Server Query Process
JSP Query API
combine all results
Jena-Fuseki
Triplestore
insert search rank in predicate
lthttpvivoweborgontology
corerankgt
LuceneSOLR
Index
sparql query for URI extract search rank
extract URI
for each result
lucene search for ezra cornell
index built with predicate values
ltskosprefLabelgt
ltskosaltLabelgt
httpservicesld4lorgld4l_servicesloc_name_batchjspquery=ezra20cornellampmaxRecords=10
One full setup per authority
UI-QA-Authority
QA ndash normalize RDF returned from an authority
httplocalhost3000qasearchlinked_data
oclc_fastpersonal_nameq=twainampmaximumRecords=2
[urihttpidworldcatorgfast31622id31622
labelTwain Mark 1835-1910
urihttpidworldcatorgfast365563id365563
labelTwain Shania
httpexperimentalworldcatorgfastsearchquery=o
clcpersonalName+22twain22
ampsortKeys=usageampmaximumRecords=2
RDF of
search
results
Active-Triples
LDF Cache
(Marmotta or
Blazegraph) LDF Cache Jena-Fuseki-
Lucene
Cache
Direct Access
of External
Authority
HyraxVitrolib ndash UI for selecting an entry from an authority
search of cache performed via Lucene index
Third Set of Challenges
8 Disambiguation through better context eg expand from just prefLabel tohellip
preLabel altLabel birthdeath dates occupation etc
9 Reconciliation across multiple sources eg match LoC URI to OCLC FAST URI
53
Whatrsquos next
Addressing Architectural Challenges
bull Generalize process for accessing context on the
cache server and in the normalization layer
bull Multi-authority search and reconciliation
bull Address the need for cache refresh
bull Mirrored cache servers
User Experience and Design
bull User-centered Design
bull Observe listen learn design evaluate iterate
bull Iteratively design and evaluate UI for lookupauthorities
with catalogers
bull Search result rankingorderingfiltering for catalogers
bull Additional UI platforms eg FOLIO
56
Questions
httptinyurlcomld4l-auth-access
Appendix for Challenges 1-4
Challenge 1 Documentation
58
LoC httpidlocgovtechcenter
C Harlow notes on reconciling LoC - httpsgithubcomcmh2166lc-reconcile
OCLC FAST
httpswwwoclcorgdeveloperdevelopweb-servicesfast-apilinked-dataenhtml
GeoNames
httpwwwgeonamesorgexportgeonames-searchhtml
AGROVOC httpaimsfaoorgvest-registryvocabulariesagrovoc-multilingual-agricultural-thesaurus
swagger config httpsgithubcomNatLibFiSkosmosblobmasterswaggerjson
NALT
httpsagclassnalusdagov
DBpedia httpwikidbpediaorgOnlineAccess1220Public20Faceted20Web20Service20Inter
face
Challenge 2 Linked Data Access API
59
for Search Query for Term Fetch
LoC not supported URI
OCLC FAST httpexperimentalworldcatorgfastsearchq
uery=subauth+all+22query22ampsortK
eys=usageampmaximumRecords=maximumR
ecords
URI
GeoNames httpapigeonamesorgsearchq=queryamp
maxRows=maxRowsampusername=userna
meamptype=rdf
URI
AGROVOC httpartemideartuniroma2it8081agrovocr
estv1searchquery=queryamplang=lang
httpartemideartuniroma2it8081agrovo
crestv1datauri=httpaimsfaoorgaosa
grovocterm_id
NALT httpskosmoslibrarycornelledurestv1nalt
searchquery=queryamplang=lang
httpskosmoslibrarycornelledurestv1na
ltdatauri=term_uri
DBpedia
Challenge 3 Varying Results Formats
60
for Search Query for Term Fetch
LoC not supported rdf-xml
OCLC FAST rdf-xml rdf-xml
GeoNames rdf-xml rdf-xml
AGROVOC json-ld rdf-xml json-ld turtle
NALT json-ld rdf-xml json-ld turtle
DBpedia
Challenge 4 Varying Ontologies
61
Primary Ontology Flat vs Navigation
required
LoC madsrdf
SKOS
navigation required
OCLC FAST schemaorg
SKOS
flat
GeoNames geonames flat
hierarchical
AGROVOC SKOS flat
hierarchical
NALT SKOS flat
hierarchical
DBpedia dbpedia flat
Configurations for Questioning Authority
62
LoC httpsgithubcomld4l-
labslinked_data_authoritiestreemasterqa_locconfigauthoritieslinked_dat
a
OCLC FAST httpsgithubcomld4l-
labslinked_data_authoritiestreemasterqa_oclcfastconfigauthoritieslinked
_data
GeoNames httpsgithubcomld4l-
labslinked_data_authoritiestreemasterqa_geonamesconfigauthoritieslink
ed_data
AGROVOC httpsgithubcomld4l-
labslinked_data_authoritiestreemasterqa_agrovocconfigauthoritieslinked
_data
NALT httpsgithubcomld4l-
labslinked_data_authoritiestreemasterqa_naltconfigauthoritieslinked_dat
a
DBpedia httpsgithubcomld4l-
labslinked_data_authoritiestreemasterqa_dbpediaconfigauthoritieslinked
_data
Appendix for Challenges 5-7
Creating a Cache Server
Hardware
bull 8-core 64gb 3Ghz Mac Pro (late 2013) macOS Sierra
(10126)
bull 32tb Pegasus-2 Thunderbolt RAID configured as RAID-5
Triplestore
bull Apache Jena Fuseki 240 provides SPARQL endpoint
bull Apache Tomcat 90 runs custom web application(s)
bull Apache Lucene 36 provides search interface
64
Customizations
bull custom per-data-source JSP web application provides
searchbrowsedownload functionality
bull custom (generic) SPARQL Tag Library provides API for web
apps (available at httpsgithubcomeichmannlod-utilities)
bull custom (generic) Lucene Tag Library provides API for web apps
65
Loading a New Vocabulary
bull download RDF
bull if necessary convert to n-triples (required for GeoNames data for instance)
bull use tdbloader2 to populated triplestore
bull configure Fuseki server(s) with triplestore details
bull create new JSP project in Eclipse
bull write one or more indexer programs that populate Lucene indices and run indexer(s)
bull write searchbrowsedownload application logic using the SPARQL and Lucene tags
bull package project as war
bull deploy to Apache Tomcat server(s)
bull add new service to Apache HTTPD virtual host specification
66
UI Access to Cache Server
httpservicesld4lorgld4l_servicesloc_namejsp
Downloads
68
LoC httpidlocgovdownload (n-triples OR rdf-xml)
OCLC FAST httpwwwoclcorgresearchthemesdata-sciencefastdownloadhtml (n-triples)
GeoNames httpwwwgeonamesorgontologydocumentationhtml (custom format ndash see notes for processing)
AGROVOC httpsaims-faoatlassiannetwikispacesAGVpages2949126Releases (n-triples OR rdf-xml)
NALT httpsagclassnalusdagovdownloadshtml (rdf-xml)
DBpedia httpwikidbpediaorgdownloads-2016-04
Potential Options for Reconciliation
bull VIAF for name reconciliation ndash we are doing some
work with this
bull Wikidata ndash Ive heard that they are working on
Reconciliation issues but havent yet explored in
depth bull Intro Video (3hrs)
bull API Access
bull SPARQL ndash user manual
bull federated queries with other authorities
Doing a google search for linked data reconciliation
returns a large number of articles and presentations
on this concept
Links to Code amp More
bull QA Server - Code for a small app that provides the
Questioning Authority normalization layer
bull Linked Data Authorities - Configurations that can
be used with QA Server
bull LD4L Services - UI access to Cache Server
bull VitroLib - Code for the VitroLib cataloging tool
30
Architecture
Technical Motivation
bull Linked data provideshellip
bull URIs that identify specific terms (as opposed to ambiguity of using
strings)
bull Reconciliation to relate terms that are defined in separate authorities
bull Goals of implementationhellip
bull Provide a single process to access many authorities
bull Provide efficient and reliable access to authorities
bull Provide a means for disambiguation that empowers library staff to
make the most accurate selections
First Set of Challenges
1 Finding Documentation
2 Linked Data Access API eg no support partial support requires login credentials
sparql query endpoint only
3 Varying Results Formats eg rdf-xml json-ld turtle n-triples etc
4 Varying Ontologies eg SKOS schemaorg madsrdf dbpedia geonames
Multi-Server Architecture
QA ndash normalize RDF returned from an authority
Multi-Server Architecture
QA ndash normalize RDF returned from an authority
Direct Access
of External
Authority
HyraxVitrolib ndash UI for selecting an entry from an authority
Multi-Server Architecture
QA ndash normalize RDF returned from an authority
httplocalhost3000qasearchlinked_data
oclc_fastpersonal_nameq=twainamp
maximumRecords=2
Direct Access
of External
Authority
HyraxVitrolib ndash UI for selecting an entry from an authority
Multi-Server Architecture
QA ndash normalize RDF returned from an authority
httplocalhost3000qasearchlinked_data
oclc_fastpersonal_nameq=twainamp
maximumRecords=2
httpexperimentalworldcatorgfast
searchquery=oclcpersonalName+22twain22
ampsortKeys=usageampmaximumRecords=2
Direct Access
of External
Authority
HyraxVitrolib ndash UI for selecting an entry from an authority
Multi-Server Architecture
QA ndash normalize RDF returned from an authority
httplocalhost3000qasearchlinked_data
oclc_fastpersonal_nameq=twainamp
maximumRecords=2
httpexperimentalworldcatorgfast
searchquery=oclcpersonalName+22twain22
ampsortKeys=usageampmaximumRecords=2
lthttpidworldcatorgfast31622gt
a schemaPerson
dctermsidentifier 31622
skosprefLabel Twain Mark 1835-1910
skosaltLabel Make Teviin 1835-1910
Make Tuwen 1835-1910
lthttpidworldcatorgfast365563gt
a schemaPerson
dctermsidentifier 365563
skosprefLabel Twain Shania
skosaltLabel Twain Eilleen
Edwards Eilleen
Direct Access
of External
Authority
HyraxVitrolib ndash UI for selecting an entry from an authority
Multi-Server Architecture
QA ndash normalize RDF returned from an authority
httplocalhost3000qasearchlinked_data
oclc_fastpersonal_nameq=twainamp
maximumRecords=2
[urihttpidworldcatorgfast31622
id31622 labelTwain Mark 1835-1910
urihttpidworldcatorgfast365563
id365563labelTwain Shania ]
httpexperimentalworldcatorgfast
searchquery=oclcpersonalName+22twain22
ampsortKeys=usageampmaximumRecords=2
lthttpidworldcatorgfast31622gt
a schemaPerson
dctermsidentifier 31622
skosprefLabel Twain Mark 1835-1910
skosaltLabel Make Teviin 1835-1910
Make Tuwen 1835-1910
lthttpidworldcatorgfast365563gt
a schemaPerson
dctermsidentifier 365563
skosprefLabel Twain Shania
skosaltLabel Twain Eilleen
Edwards Eilleen
Direct Access
of External
Authority
HyraxVitrolib ndash UI for selecting an entry from an authority
Direct Access Query API
Direct against authorityhellip
httpexperimentalworldcatorgfastsearch
query=oclcpersonalName+22twain22
ampmaximumRecords=2
httpapigeonamesorgsearch
q=ithaca
ampmaxRows=2
ampusername=demo
amptype=rdf
httpartemideartuniroma2it8081agrovocrestv1search
query=milk
amplang=en
ampmaxhits=2
Normalized Query API
Through QA normalization layerhellip
httplocalhost3000qasearchlinked_dataoclc_fast
q=twain
ampmaxRecords=2
httplocalhost3000qasearchlinked_datageonames
q=ithaca
ampmaxRecords=2
httplocalhost3000qasearchlinked_dataagrovoc
q=milk
ampmaxRecords=2
amplang=en
Normalized Results
[urihttpidworldcatorgfast31622 id31622 labelTwain Mark 1835-1910 urihttpidworldcatorgfast365563 id365563 labelTwain Shania]
[uri httpswsgeonamesorg2162552 id httpswsgeonamesorg2162552 label Ithaca (AU) uri httpswsgeonamesorg4515289 id httpswsgeonamesorg4515289 label Ithaca (US)]
[uri httpaimsfaoorgaosagrovocc_8602 id httpaimsfaoorgaosagrovocc_8602 label acidophilus milk uri httpaimsfaoorgaosagrovocc_16076 id httpaimsfaoorgaosagrovocc_16076 label buffalo milk]
OCLC FAST GeoNames AgroVoc
Second Set of Challenges
5 Reliability amp Efficiency eg server uptime server load
6 Accuracy eg select results based on usage data lexical match
custom weighting other
7 Order Ranking eg How to order a graph
Cache Server Query Process
JSP Query API
Jena-Fuseki
Triplestore
One full setup per authority
LuceneSOLR
Index
Cache Server Query Process
JSP Query API
httpservicesld4lorgld4l_servicesloc_name_batchjspquery=ezra20cornellampmaxRecords=10
Jena-Fuseki
Triplestore
LuceneSOLR
Index
One full setup per authority
Cache Server Query Process
JSP Query API
lucene search for ezra cornell
index built with predicate values
ltskosprefLabelgt
ltskosaltLabelgt
Jena-Fuseki
Triplestore
LuceneSOLR
Index
httpservicesld4lorgld4l_servicesloc_name_batchjspquery=ezra20cornellampmaxRecords=10
One full setup per authority
Cache Server Query Process
JSP Query API
extract search rank
extract URI
Jena-Fuseki
Triplestore
for each result
LuceneSOLR
Index
lucene search for ezra cornell
index built with predicate values
ltskosprefLabelgt
ltskosaltLabelgt
httpservicesld4lorgld4l_servicesloc_name_batchjspquery=ezra20cornellampmaxRecords=10
One full setup per authority
Cache Server Query Process
JSP Query API
sparql query for URI
Jena-Fuseki
Triplestore
LuceneSOLR
Index
extract search rank
extract URI
for each result
lucene search for ezra cornell
index built with predicate values
ltskosprefLabelgt
ltskosaltLabelgt
httpservicesld4lorgld4l_servicesloc_name_batchjspquery=ezra20cornellampmaxRecords=10
One full setup per authority
Cache Server Query Process
JSP Query API
combine all results
Jena-Fuseki
Triplestore
insert search rank in predicate
lthttpvivoweborgontology
corerankgt
LuceneSOLR
Index
sparql query for URI extract search rank
extract URI
for each result
lucene search for ezra cornell
index built with predicate values
ltskosprefLabelgt
ltskosaltLabelgt
httpservicesld4lorgld4l_servicesloc_name_batchjspquery=ezra20cornellampmaxRecords=10
One full setup per authority
UI-QA-Authority
QA ndash normalize RDF returned from an authority
httplocalhost3000qasearchlinked_data
oclc_fastpersonal_nameq=twainampmaximumRecords=2
[urihttpidworldcatorgfast31622id31622
labelTwain Mark 1835-1910
urihttpidworldcatorgfast365563id365563
labelTwain Shania
httpexperimentalworldcatorgfastsearchquery=o
clcpersonalName+22twain22
ampsortKeys=usageampmaximumRecords=2
RDF of
search
results
Active-Triples
LDF Cache
(Marmotta or
Blazegraph) LDF Cache Jena-Fuseki-
Lucene
Cache
Direct Access
of External
Authority
HyraxVitrolib ndash UI for selecting an entry from an authority
search of cache performed via Lucene index
Third Set of Challenges
8 Disambiguation through better context eg expand from just prefLabel tohellip
preLabel altLabel birthdeath dates occupation etc
9 Reconciliation across multiple sources eg match LoC URI to OCLC FAST URI
53
Whatrsquos next
Addressing Architectural Challenges
bull Generalize process for accessing context on the
cache server and in the normalization layer
bull Multi-authority search and reconciliation
bull Address the need for cache refresh
bull Mirrored cache servers
User Experience and Design
bull User-centered Design
bull Observe listen learn design evaluate iterate
bull Iteratively design and evaluate UI for lookupauthorities
with catalogers
bull Search result rankingorderingfiltering for catalogers
bull Additional UI platforms eg FOLIO
56
Questions
httptinyurlcomld4l-auth-access
Appendix for Challenges 1-4
Challenge 1 Documentation
58
LoC httpidlocgovtechcenter
C Harlow notes on reconciling LoC - httpsgithubcomcmh2166lc-reconcile
OCLC FAST
httpswwwoclcorgdeveloperdevelopweb-servicesfast-apilinked-dataenhtml
GeoNames
httpwwwgeonamesorgexportgeonames-searchhtml
AGROVOC httpaimsfaoorgvest-registryvocabulariesagrovoc-multilingual-agricultural-thesaurus
swagger config httpsgithubcomNatLibFiSkosmosblobmasterswaggerjson
NALT
httpsagclassnalusdagov
DBpedia httpwikidbpediaorgOnlineAccess1220Public20Faceted20Web20Service20Inter
face
Challenge 2 Linked Data Access API
59
for Search Query for Term Fetch
LoC not supported URI
OCLC FAST httpexperimentalworldcatorgfastsearchq
uery=subauth+all+22query22ampsortK
eys=usageampmaximumRecords=maximumR
ecords
URI
GeoNames httpapigeonamesorgsearchq=queryamp
maxRows=maxRowsampusername=userna
meamptype=rdf
URI
AGROVOC httpartemideartuniroma2it8081agrovocr
estv1searchquery=queryamplang=lang
httpartemideartuniroma2it8081agrovo
crestv1datauri=httpaimsfaoorgaosa
grovocterm_id
NALT httpskosmoslibrarycornelledurestv1nalt
searchquery=queryamplang=lang
httpskosmoslibrarycornelledurestv1na
ltdatauri=term_uri
DBpedia
Challenge 3 Varying Results Formats
60
for Search Query for Term Fetch
LoC not supported rdf-xml
OCLC FAST rdf-xml rdf-xml
GeoNames rdf-xml rdf-xml
AGROVOC json-ld rdf-xml json-ld turtle
NALT json-ld rdf-xml json-ld turtle
DBpedia
Challenge 4 Varying Ontologies
61
Primary Ontology Flat vs Navigation
required
LoC madsrdf
SKOS
navigation required
OCLC FAST schemaorg
SKOS
flat
GeoNames geonames flat
hierarchical
AGROVOC SKOS flat
hierarchical
NALT SKOS flat
hierarchical
DBpedia dbpedia flat
Configurations for Questioning Authority
62
LoC httpsgithubcomld4l-
labslinked_data_authoritiestreemasterqa_locconfigauthoritieslinked_dat
a
OCLC FAST httpsgithubcomld4l-
labslinked_data_authoritiestreemasterqa_oclcfastconfigauthoritieslinked
_data
GeoNames httpsgithubcomld4l-
labslinked_data_authoritiestreemasterqa_geonamesconfigauthoritieslink
ed_data
AGROVOC httpsgithubcomld4l-
labslinked_data_authoritiestreemasterqa_agrovocconfigauthoritieslinked
_data
NALT httpsgithubcomld4l-
labslinked_data_authoritiestreemasterqa_naltconfigauthoritieslinked_dat
a
DBpedia httpsgithubcomld4l-
labslinked_data_authoritiestreemasterqa_dbpediaconfigauthoritieslinked
_data
Appendix for Challenges 5-7
Creating a Cache Server
Hardware
bull 8-core 64gb 3Ghz Mac Pro (late 2013) macOS Sierra
(10126)
bull 32tb Pegasus-2 Thunderbolt RAID configured as RAID-5
Triplestore
bull Apache Jena Fuseki 240 provides SPARQL endpoint
bull Apache Tomcat 90 runs custom web application(s)
bull Apache Lucene 36 provides search interface
64
Customizations
bull custom per-data-source JSP web application provides
searchbrowsedownload functionality
bull custom (generic) SPARQL Tag Library provides API for web
apps (available at httpsgithubcomeichmannlod-utilities)
bull custom (generic) Lucene Tag Library provides API for web apps
65
Loading a New Vocabulary
bull download RDF
bull if necessary convert to n-triples (required for GeoNames data for instance)
bull use tdbloader2 to populated triplestore
bull configure Fuseki server(s) with triplestore details
bull create new JSP project in Eclipse
bull write one or more indexer programs that populate Lucene indices and run indexer(s)
bull write searchbrowsedownload application logic using the SPARQL and Lucene tags
bull package project as war
bull deploy to Apache Tomcat server(s)
bull add new service to Apache HTTPD virtual host specification
66
UI Access to Cache Server
httpservicesld4lorgld4l_servicesloc_namejsp
Downloads
68
LoC httpidlocgovdownload (n-triples OR rdf-xml)
OCLC FAST httpwwwoclcorgresearchthemesdata-sciencefastdownloadhtml (n-triples)
GeoNames httpwwwgeonamesorgontologydocumentationhtml (custom format ndash see notes for processing)
AGROVOC httpsaims-faoatlassiannetwikispacesAGVpages2949126Releases (n-triples OR rdf-xml)
NALT httpsagclassnalusdagovdownloadshtml (rdf-xml)
DBpedia httpwikidbpediaorgdownloads-2016-04
Potential Options for Reconciliation
bull VIAF for name reconciliation ndash we are doing some
work with this
bull Wikidata ndash Ive heard that they are working on
Reconciliation issues but havent yet explored in
depth bull Intro Video (3hrs)
bull API Access
bull SPARQL ndash user manual
bull federated queries with other authorities
Doing a google search for linked data reconciliation
returns a large number of articles and presentations
on this concept
Links to Code amp More
bull QA Server - Code for a small app that provides the
Questioning Authority normalization layer
bull Linked Data Authorities - Configurations that can
be used with QA Server
bull LD4L Services - UI access to Cache Server
bull VitroLib - Code for the VitroLib cataloging tool
Technical Motivation
bull Linked data provideshellip
bull URIs that identify specific terms (as opposed to ambiguity of using
strings)
bull Reconciliation to relate terms that are defined in separate authorities
bull Goals of implementationhellip
bull Provide a single process to access many authorities
bull Provide efficient and reliable access to authorities
bull Provide a means for disambiguation that empowers library staff to
make the most accurate selections
First Set of Challenges
1 Finding Documentation
2 Linked Data Access API eg no support partial support requires login credentials
sparql query endpoint only
3 Varying Results Formats eg rdf-xml json-ld turtle n-triples etc
4 Varying Ontologies eg SKOS schemaorg madsrdf dbpedia geonames
Multi-Server Architecture
QA ndash normalize RDF returned from an authority
Multi-Server Architecture
QA ndash normalize RDF returned from an authority
Direct Access
of External
Authority
HyraxVitrolib ndash UI for selecting an entry from an authority
Multi-Server Architecture
QA ndash normalize RDF returned from an authority
httplocalhost3000qasearchlinked_data
oclc_fastpersonal_nameq=twainamp
maximumRecords=2
Direct Access
of External
Authority
HyraxVitrolib ndash UI for selecting an entry from an authority
Multi-Server Architecture
QA ndash normalize RDF returned from an authority
httplocalhost3000qasearchlinked_data
oclc_fastpersonal_nameq=twainamp
maximumRecords=2
httpexperimentalworldcatorgfast
searchquery=oclcpersonalName+22twain22
ampsortKeys=usageampmaximumRecords=2
Direct Access
of External
Authority
HyraxVitrolib ndash UI for selecting an entry from an authority
Multi-Server Architecture
QA ndash normalize RDF returned from an authority
httplocalhost3000qasearchlinked_data
oclc_fastpersonal_nameq=twainamp
maximumRecords=2
httpexperimentalworldcatorgfast
searchquery=oclcpersonalName+22twain22
ampsortKeys=usageampmaximumRecords=2
lthttpidworldcatorgfast31622gt
a schemaPerson
dctermsidentifier 31622
skosprefLabel Twain Mark 1835-1910
skosaltLabel Make Teviin 1835-1910
Make Tuwen 1835-1910
lthttpidworldcatorgfast365563gt
a schemaPerson
dctermsidentifier 365563
skosprefLabel Twain Shania
skosaltLabel Twain Eilleen
Edwards Eilleen
Direct Access
of External
Authority
HyraxVitrolib ndash UI for selecting an entry from an authority
Multi-Server Architecture
QA ndash normalize RDF returned from an authority
httplocalhost3000qasearchlinked_data
oclc_fastpersonal_nameq=twainamp
maximumRecords=2
[urihttpidworldcatorgfast31622
id31622 labelTwain Mark 1835-1910
urihttpidworldcatorgfast365563
id365563labelTwain Shania ]
httpexperimentalworldcatorgfast
searchquery=oclcpersonalName+22twain22
ampsortKeys=usageampmaximumRecords=2
lthttpidworldcatorgfast31622gt
a schemaPerson
dctermsidentifier 31622
skosprefLabel Twain Mark 1835-1910
skosaltLabel Make Teviin 1835-1910
Make Tuwen 1835-1910
lthttpidworldcatorgfast365563gt
a schemaPerson
dctermsidentifier 365563
skosprefLabel Twain Shania
skosaltLabel Twain Eilleen
Edwards Eilleen
Direct Access
of External
Authority
HyraxVitrolib ndash UI for selecting an entry from an authority
Direct Access Query API
Direct against authorityhellip
httpexperimentalworldcatorgfastsearch
query=oclcpersonalName+22twain22
ampmaximumRecords=2
httpapigeonamesorgsearch
q=ithaca
ampmaxRows=2
ampusername=demo
amptype=rdf
httpartemideartuniroma2it8081agrovocrestv1search
query=milk
amplang=en
ampmaxhits=2
Normalized Query API
Through QA normalization layerhellip
httplocalhost3000qasearchlinked_dataoclc_fast
q=twain
ampmaxRecords=2
httplocalhost3000qasearchlinked_datageonames
q=ithaca
ampmaxRecords=2
httplocalhost3000qasearchlinked_dataagrovoc
q=milk
ampmaxRecords=2
amplang=en
Normalized Results
[urihttpidworldcatorgfast31622 id31622 labelTwain Mark 1835-1910 urihttpidworldcatorgfast365563 id365563 labelTwain Shania]
[uri httpswsgeonamesorg2162552 id httpswsgeonamesorg2162552 label Ithaca (AU) uri httpswsgeonamesorg4515289 id httpswsgeonamesorg4515289 label Ithaca (US)]
[uri httpaimsfaoorgaosagrovocc_8602 id httpaimsfaoorgaosagrovocc_8602 label acidophilus milk uri httpaimsfaoorgaosagrovocc_16076 id httpaimsfaoorgaosagrovocc_16076 label buffalo milk]
OCLC FAST GeoNames AgroVoc
Second Set of Challenges
5 Reliability amp Efficiency eg server uptime server load
6 Accuracy eg select results based on usage data lexical match
custom weighting other
7 Order Ranking eg How to order a graph
Cache Server Query Process
JSP Query API
Jena-Fuseki
Triplestore
One full setup per authority
LuceneSOLR
Index
Cache Server Query Process
JSP Query API
httpservicesld4lorgld4l_servicesloc_name_batchjspquery=ezra20cornellampmaxRecords=10
Jena-Fuseki
Triplestore
LuceneSOLR
Index
One full setup per authority
Cache Server Query Process
JSP Query API
lucene search for ezra cornell
index built with predicate values
ltskosprefLabelgt
ltskosaltLabelgt
Jena-Fuseki
Triplestore
LuceneSOLR
Index
httpservicesld4lorgld4l_servicesloc_name_batchjspquery=ezra20cornellampmaxRecords=10
One full setup per authority
Cache Server Query Process
JSP Query API
extract search rank
extract URI
Jena-Fuseki
Triplestore
for each result
LuceneSOLR
Index
lucene search for ezra cornell
index built with predicate values
ltskosprefLabelgt
ltskosaltLabelgt
httpservicesld4lorgld4l_servicesloc_name_batchjspquery=ezra20cornellampmaxRecords=10
One full setup per authority
Cache Server Query Process
JSP Query API
sparql query for URI
Jena-Fuseki
Triplestore
LuceneSOLR
Index
extract search rank
extract URI
for each result
lucene search for ezra cornell
index built with predicate values
ltskosprefLabelgt
ltskosaltLabelgt
httpservicesld4lorgld4l_servicesloc_name_batchjspquery=ezra20cornellampmaxRecords=10
One full setup per authority
Cache Server Query Process
JSP Query API
combine all results
Jena-Fuseki
Triplestore
insert search rank in predicate
lthttpvivoweborgontology
corerankgt
LuceneSOLR
Index
sparql query for URI extract search rank
extract URI
for each result
lucene search for ezra cornell
index built with predicate values
ltskosprefLabelgt
ltskosaltLabelgt
httpservicesld4lorgld4l_servicesloc_name_batchjspquery=ezra20cornellampmaxRecords=10
One full setup per authority
UI-QA-Authority
QA ndash normalize RDF returned from an authority
httplocalhost3000qasearchlinked_data
oclc_fastpersonal_nameq=twainampmaximumRecords=2
[urihttpidworldcatorgfast31622id31622
labelTwain Mark 1835-1910
urihttpidworldcatorgfast365563id365563
labelTwain Shania
httpexperimentalworldcatorgfastsearchquery=o
clcpersonalName+22twain22
ampsortKeys=usageampmaximumRecords=2
RDF of
search
results
Active-Triples
LDF Cache
(Marmotta or
Blazegraph) LDF Cache Jena-Fuseki-
Lucene
Cache
Direct Access
of External
Authority
HyraxVitrolib ndash UI for selecting an entry from an authority
search of cache performed via Lucene index
Third Set of Challenges
8 Disambiguation through better context eg expand from just prefLabel tohellip
preLabel altLabel birthdeath dates occupation etc
9 Reconciliation across multiple sources eg match LoC URI to OCLC FAST URI
53
Whatrsquos next
Addressing Architectural Challenges
bull Generalize process for accessing context on the
cache server and in the normalization layer
bull Multi-authority search and reconciliation
bull Address the need for cache refresh
bull Mirrored cache servers
User Experience and Design
bull User-centered Design
bull Observe listen learn design evaluate iterate
bull Iteratively design and evaluate UI for lookupauthorities
with catalogers
bull Search result rankingorderingfiltering for catalogers
bull Additional UI platforms eg FOLIO
56
Questions
httptinyurlcomld4l-auth-access
Appendix for Challenges 1-4
Challenge 1 Documentation
58
LoC httpidlocgovtechcenter
C Harlow notes on reconciling LoC - httpsgithubcomcmh2166lc-reconcile
OCLC FAST
httpswwwoclcorgdeveloperdevelopweb-servicesfast-apilinked-dataenhtml
GeoNames
httpwwwgeonamesorgexportgeonames-searchhtml
AGROVOC httpaimsfaoorgvest-registryvocabulariesagrovoc-multilingual-agricultural-thesaurus
swagger config httpsgithubcomNatLibFiSkosmosblobmasterswaggerjson
NALT
httpsagclassnalusdagov
DBpedia httpwikidbpediaorgOnlineAccess1220Public20Faceted20Web20Service20Inter
face
Challenge 2 Linked Data Access API
59
for Search Query for Term Fetch
LoC not supported URI
OCLC FAST httpexperimentalworldcatorgfastsearchq
uery=subauth+all+22query22ampsortK
eys=usageampmaximumRecords=maximumR
ecords
URI
GeoNames httpapigeonamesorgsearchq=queryamp
maxRows=maxRowsampusername=userna
meamptype=rdf
URI
AGROVOC httpartemideartuniroma2it8081agrovocr
estv1searchquery=queryamplang=lang
httpartemideartuniroma2it8081agrovo
crestv1datauri=httpaimsfaoorgaosa
grovocterm_id
NALT httpskosmoslibrarycornelledurestv1nalt
searchquery=queryamplang=lang
httpskosmoslibrarycornelledurestv1na
ltdatauri=term_uri
DBpedia
Challenge 3 Varying Results Formats
60
for Search Query for Term Fetch
LoC not supported rdf-xml
OCLC FAST rdf-xml rdf-xml
GeoNames rdf-xml rdf-xml
AGROVOC json-ld rdf-xml json-ld turtle
NALT json-ld rdf-xml json-ld turtle
DBpedia
Challenge 4 Varying Ontologies
61
Primary Ontology Flat vs Navigation
required
LoC madsrdf
SKOS
navigation required
OCLC FAST schemaorg
SKOS
flat
GeoNames geonames flat
hierarchical
AGROVOC SKOS flat
hierarchical
NALT SKOS flat
hierarchical
DBpedia dbpedia flat
Configurations for Questioning Authority
62
LoC httpsgithubcomld4l-
labslinked_data_authoritiestreemasterqa_locconfigauthoritieslinked_dat
a
OCLC FAST httpsgithubcomld4l-
labslinked_data_authoritiestreemasterqa_oclcfastconfigauthoritieslinked
_data
GeoNames httpsgithubcomld4l-
labslinked_data_authoritiestreemasterqa_geonamesconfigauthoritieslink
ed_data
AGROVOC httpsgithubcomld4l-
labslinked_data_authoritiestreemasterqa_agrovocconfigauthoritieslinked
_data
NALT httpsgithubcomld4l-
labslinked_data_authoritiestreemasterqa_naltconfigauthoritieslinked_dat
a
DBpedia httpsgithubcomld4l-
labslinked_data_authoritiestreemasterqa_dbpediaconfigauthoritieslinked
_data
Appendix for Challenges 5-7
Creating a Cache Server
Hardware
bull 8-core 64gb 3Ghz Mac Pro (late 2013) macOS Sierra
(10126)
bull 32tb Pegasus-2 Thunderbolt RAID configured as RAID-5
Triplestore
bull Apache Jena Fuseki 240 provides SPARQL endpoint
bull Apache Tomcat 90 runs custom web application(s)
bull Apache Lucene 36 provides search interface
64
Customizations
bull custom per-data-source JSP web application provides
searchbrowsedownload functionality
bull custom (generic) SPARQL Tag Library provides API for web
apps (available at httpsgithubcomeichmannlod-utilities)
bull custom (generic) Lucene Tag Library provides API for web apps
65
Loading a New Vocabulary
bull download RDF
bull if necessary convert to n-triples (required for GeoNames data for instance)
bull use tdbloader2 to populated triplestore
bull configure Fuseki server(s) with triplestore details
bull create new JSP project in Eclipse
bull write one or more indexer programs that populate Lucene indices and run indexer(s)
bull write searchbrowsedownload application logic using the SPARQL and Lucene tags
bull package project as war
bull deploy to Apache Tomcat server(s)
bull add new service to Apache HTTPD virtual host specification
66
UI Access to Cache Server
httpservicesld4lorgld4l_servicesloc_namejsp
Downloads
68
LoC httpidlocgovdownload (n-triples OR rdf-xml)
OCLC FAST httpwwwoclcorgresearchthemesdata-sciencefastdownloadhtml (n-triples)
GeoNames httpwwwgeonamesorgontologydocumentationhtml (custom format ndash see notes for processing)
AGROVOC httpsaims-faoatlassiannetwikispacesAGVpages2949126Releases (n-triples OR rdf-xml)
NALT httpsagclassnalusdagovdownloadshtml (rdf-xml)
DBpedia httpwikidbpediaorgdownloads-2016-04
Potential Options for Reconciliation
bull VIAF for name reconciliation ndash we are doing some
work with this
bull Wikidata ndash Ive heard that they are working on
Reconciliation issues but havent yet explored in
depth bull Intro Video (3hrs)
bull API Access
bull SPARQL ndash user manual
bull federated queries with other authorities
Doing a google search for linked data reconciliation
returns a large number of articles and presentations
on this concept
Links to Code amp More
bull QA Server - Code for a small app that provides the
Questioning Authority normalization layer
bull Linked Data Authorities - Configurations that can
be used with QA Server
bull LD4L Services - UI access to Cache Server
bull VitroLib - Code for the VitroLib cataloging tool
First Set of Challenges
1 Finding Documentation
2 Linked Data Access API eg no support partial support requires login credentials
sparql query endpoint only
3 Varying Results Formats eg rdf-xml json-ld turtle n-triples etc
4 Varying Ontologies eg SKOS schemaorg madsrdf dbpedia geonames
Multi-Server Architecture
QA ndash normalize RDF returned from an authority
Multi-Server Architecture
QA ndash normalize RDF returned from an authority
Direct Access
of External
Authority
HyraxVitrolib ndash UI for selecting an entry from an authority
Multi-Server Architecture
QA ndash normalize RDF returned from an authority
httplocalhost3000qasearchlinked_data
oclc_fastpersonal_nameq=twainamp
maximumRecords=2
Direct Access
of External
Authority
HyraxVitrolib ndash UI for selecting an entry from an authority
Multi-Server Architecture
QA ndash normalize RDF returned from an authority
httplocalhost3000qasearchlinked_data
oclc_fastpersonal_nameq=twainamp
maximumRecords=2
httpexperimentalworldcatorgfast
searchquery=oclcpersonalName+22twain22
ampsortKeys=usageampmaximumRecords=2
Direct Access
of External
Authority
HyraxVitrolib ndash UI for selecting an entry from an authority
Multi-Server Architecture
QA ndash normalize RDF returned from an authority
httplocalhost3000qasearchlinked_data
oclc_fastpersonal_nameq=twainamp
maximumRecords=2
httpexperimentalworldcatorgfast
searchquery=oclcpersonalName+22twain22
ampsortKeys=usageampmaximumRecords=2
lthttpidworldcatorgfast31622gt
a schemaPerson
dctermsidentifier 31622
skosprefLabel Twain Mark 1835-1910
skosaltLabel Make Teviin 1835-1910
Make Tuwen 1835-1910
lthttpidworldcatorgfast365563gt
a schemaPerson
dctermsidentifier 365563
skosprefLabel Twain Shania
skosaltLabel Twain Eilleen
Edwards Eilleen
Direct Access
of External
Authority
HyraxVitrolib ndash UI for selecting an entry from an authority
Multi-Server Architecture
QA ndash normalize RDF returned from an authority
httplocalhost3000qasearchlinked_data
oclc_fastpersonal_nameq=twainamp
maximumRecords=2
[urihttpidworldcatorgfast31622
id31622 labelTwain Mark 1835-1910
urihttpidworldcatorgfast365563
id365563labelTwain Shania ]
httpexperimentalworldcatorgfast
searchquery=oclcpersonalName+22twain22
ampsortKeys=usageampmaximumRecords=2
lthttpidworldcatorgfast31622gt
a schemaPerson
dctermsidentifier 31622
skosprefLabel Twain Mark 1835-1910
skosaltLabel Make Teviin 1835-1910
Make Tuwen 1835-1910
lthttpidworldcatorgfast365563gt
a schemaPerson
dctermsidentifier 365563
skosprefLabel Twain Shania
skosaltLabel Twain Eilleen
Edwards Eilleen
Direct Access
of External
Authority
HyraxVitrolib ndash UI for selecting an entry from an authority
Direct Access Query API
Direct against authorityhellip
httpexperimentalworldcatorgfastsearch
query=oclcpersonalName+22twain22
ampmaximumRecords=2
httpapigeonamesorgsearch
q=ithaca
ampmaxRows=2
ampusername=demo
amptype=rdf
httpartemideartuniroma2it8081agrovocrestv1search
query=milk
amplang=en
ampmaxhits=2
Normalized Query API
Through QA normalization layerhellip
httplocalhost3000qasearchlinked_dataoclc_fast
q=twain
ampmaxRecords=2
httplocalhost3000qasearchlinked_datageonames
q=ithaca
ampmaxRecords=2
httplocalhost3000qasearchlinked_dataagrovoc
q=milk
ampmaxRecords=2
amplang=en
Normalized Results
[urihttpidworldcatorgfast31622 id31622 labelTwain Mark 1835-1910 urihttpidworldcatorgfast365563 id365563 labelTwain Shania]
[uri httpswsgeonamesorg2162552 id httpswsgeonamesorg2162552 label Ithaca (AU) uri httpswsgeonamesorg4515289 id httpswsgeonamesorg4515289 label Ithaca (US)]
[uri httpaimsfaoorgaosagrovocc_8602 id httpaimsfaoorgaosagrovocc_8602 label acidophilus milk uri httpaimsfaoorgaosagrovocc_16076 id httpaimsfaoorgaosagrovocc_16076 label buffalo milk]
OCLC FAST GeoNames AgroVoc
Second Set of Challenges
5 Reliability amp Efficiency eg server uptime server load
6 Accuracy eg select results based on usage data lexical match
custom weighting other
7 Order Ranking eg How to order a graph
Cache Server Query Process
JSP Query API
Jena-Fuseki
Triplestore
One full setup per authority
LuceneSOLR
Index
Cache Server Query Process
JSP Query API
httpservicesld4lorgld4l_servicesloc_name_batchjspquery=ezra20cornellampmaxRecords=10
Jena-Fuseki
Triplestore
LuceneSOLR
Index
One full setup per authority
Cache Server Query Process
JSP Query API
lucene search for ezra cornell
index built with predicate values
ltskosprefLabelgt
ltskosaltLabelgt
Jena-Fuseki
Triplestore
LuceneSOLR
Index
httpservicesld4lorgld4l_servicesloc_name_batchjspquery=ezra20cornellampmaxRecords=10
One full setup per authority
Cache Server Query Process
JSP Query API
extract search rank
extract URI
Jena-Fuseki
Triplestore
for each result
LuceneSOLR
Index
lucene search for ezra cornell
index built with predicate values
ltskosprefLabelgt
ltskosaltLabelgt
httpservicesld4lorgld4l_servicesloc_name_batchjspquery=ezra20cornellampmaxRecords=10
One full setup per authority
Cache Server Query Process
JSP Query API
sparql query for URI
Jena-Fuseki
Triplestore
LuceneSOLR
Index
extract search rank
extract URI
for each result
lucene search for ezra cornell
index built with predicate values
ltskosprefLabelgt
ltskosaltLabelgt
httpservicesld4lorgld4l_servicesloc_name_batchjspquery=ezra20cornellampmaxRecords=10
One full setup per authority
Cache Server Query Process
JSP Query API
combine all results
Jena-Fuseki
Triplestore
insert search rank in predicate
lthttpvivoweborgontology
corerankgt
LuceneSOLR
Index
sparql query for URI extract search rank
extract URI
for each result
lucene search for ezra cornell
index built with predicate values
ltskosprefLabelgt
ltskosaltLabelgt
httpservicesld4lorgld4l_servicesloc_name_batchjspquery=ezra20cornellampmaxRecords=10
One full setup per authority
UI-QA-Authority
QA ndash normalize RDF returned from an authority
httplocalhost3000qasearchlinked_data
oclc_fastpersonal_nameq=twainampmaximumRecords=2
[urihttpidworldcatorgfast31622id31622
labelTwain Mark 1835-1910
urihttpidworldcatorgfast365563id365563
labelTwain Shania
httpexperimentalworldcatorgfastsearchquery=o
clcpersonalName+22twain22
ampsortKeys=usageampmaximumRecords=2
RDF of
search
results
Active-Triples
LDF Cache
(Marmotta or
Blazegraph) LDF Cache Jena-Fuseki-
Lucene
Cache
Direct Access
of External
Authority
HyraxVitrolib ndash UI for selecting an entry from an authority
search of cache performed via Lucene index
Third Set of Challenges
8 Disambiguation through better context eg expand from just prefLabel tohellip
preLabel altLabel birthdeath dates occupation etc
9 Reconciliation across multiple sources eg match LoC URI to OCLC FAST URI
53
Whatrsquos next
Addressing Architectural Challenges
bull Generalize process for accessing context on the
cache server and in the normalization layer
bull Multi-authority search and reconciliation
bull Address the need for cache refresh
bull Mirrored cache servers
User Experience and Design
bull User-centered Design
bull Observe listen learn design evaluate iterate
bull Iteratively design and evaluate UI for lookupauthorities
with catalogers
bull Search result rankingorderingfiltering for catalogers
bull Additional UI platforms eg FOLIO
56
Questions
httptinyurlcomld4l-auth-access
Appendix for Challenges 1-4
Challenge 1 Documentation
58
LoC httpidlocgovtechcenter
C Harlow notes on reconciling LoC - httpsgithubcomcmh2166lc-reconcile
OCLC FAST
httpswwwoclcorgdeveloperdevelopweb-servicesfast-apilinked-dataenhtml
GeoNames
httpwwwgeonamesorgexportgeonames-searchhtml
AGROVOC httpaimsfaoorgvest-registryvocabulariesagrovoc-multilingual-agricultural-thesaurus
swagger config httpsgithubcomNatLibFiSkosmosblobmasterswaggerjson
NALT
httpsagclassnalusdagov
DBpedia httpwikidbpediaorgOnlineAccess1220Public20Faceted20Web20Service20Inter
face
Challenge 2 Linked Data Access API
59
for Search Query for Term Fetch
LoC not supported URI
OCLC FAST httpexperimentalworldcatorgfastsearchq
uery=subauth+all+22query22ampsortK
eys=usageampmaximumRecords=maximumR
ecords
URI
GeoNames httpapigeonamesorgsearchq=queryamp
maxRows=maxRowsampusername=userna
meamptype=rdf
URI
AGROVOC httpartemideartuniroma2it8081agrovocr
estv1searchquery=queryamplang=lang
httpartemideartuniroma2it8081agrovo
crestv1datauri=httpaimsfaoorgaosa
grovocterm_id
NALT httpskosmoslibrarycornelledurestv1nalt
searchquery=queryamplang=lang
httpskosmoslibrarycornelledurestv1na
ltdatauri=term_uri
DBpedia
Challenge 3 Varying Results Formats
60
for Search Query for Term Fetch
LoC not supported rdf-xml
OCLC FAST rdf-xml rdf-xml
GeoNames rdf-xml rdf-xml
AGROVOC json-ld rdf-xml json-ld turtle
NALT json-ld rdf-xml json-ld turtle
DBpedia
Challenge 4 Varying Ontologies
61
Primary Ontology Flat vs Navigation
required
LoC madsrdf
SKOS
navigation required
OCLC FAST schemaorg
SKOS
flat
GeoNames geonames flat
hierarchical
AGROVOC SKOS flat
hierarchical
NALT SKOS flat
hierarchical
DBpedia dbpedia flat
Configurations for Questioning Authority
62
LoC httpsgithubcomld4l-
labslinked_data_authoritiestreemasterqa_locconfigauthoritieslinked_dat
a
OCLC FAST httpsgithubcomld4l-
labslinked_data_authoritiestreemasterqa_oclcfastconfigauthoritieslinked
_data
GeoNames httpsgithubcomld4l-
labslinked_data_authoritiestreemasterqa_geonamesconfigauthoritieslink
ed_data
AGROVOC httpsgithubcomld4l-
labslinked_data_authoritiestreemasterqa_agrovocconfigauthoritieslinked
_data
NALT httpsgithubcomld4l-
labslinked_data_authoritiestreemasterqa_naltconfigauthoritieslinked_dat
a
DBpedia httpsgithubcomld4l-
labslinked_data_authoritiestreemasterqa_dbpediaconfigauthoritieslinked
_data
Appendix for Challenges 5-7
Creating a Cache Server
Hardware
bull 8-core 64gb 3Ghz Mac Pro (late 2013) macOS Sierra
(10126)
bull 32tb Pegasus-2 Thunderbolt RAID configured as RAID-5
Triplestore
bull Apache Jena Fuseki 240 provides SPARQL endpoint
bull Apache Tomcat 90 runs custom web application(s)
bull Apache Lucene 36 provides search interface
64
Customizations
bull custom per-data-source JSP web application provides
searchbrowsedownload functionality
bull custom (generic) SPARQL Tag Library provides API for web
apps (available at httpsgithubcomeichmannlod-utilities)
bull custom (generic) Lucene Tag Library provides API for web apps
65
Loading a New Vocabulary
bull download RDF
bull if necessary convert to n-triples (required for GeoNames data for instance)
bull use tdbloader2 to populated triplestore
bull configure Fuseki server(s) with triplestore details
bull create new JSP project in Eclipse
bull write one or more indexer programs that populate Lucene indices and run indexer(s)
bull write searchbrowsedownload application logic using the SPARQL and Lucene tags
bull package project as war
bull deploy to Apache Tomcat server(s)
bull add new service to Apache HTTPD virtual host specification
66
UI Access to Cache Server
httpservicesld4lorgld4l_servicesloc_namejsp
Downloads
68
LoC httpidlocgovdownload (n-triples OR rdf-xml)
OCLC FAST httpwwwoclcorgresearchthemesdata-sciencefastdownloadhtml (n-triples)
GeoNames httpwwwgeonamesorgontologydocumentationhtml (custom format ndash see notes for processing)
AGROVOC httpsaims-faoatlassiannetwikispacesAGVpages2949126Releases (n-triples OR rdf-xml)
NALT httpsagclassnalusdagovdownloadshtml (rdf-xml)
DBpedia httpwikidbpediaorgdownloads-2016-04
Potential Options for Reconciliation
bull VIAF for name reconciliation ndash we are doing some
work with this
bull Wikidata ndash Ive heard that they are working on
Reconciliation issues but havent yet explored in
depth bull Intro Video (3hrs)
bull API Access
bull SPARQL ndash user manual
bull federated queries with other authorities
Doing a google search for linked data reconciliation
returns a large number of articles and presentations
on this concept
Links to Code amp More
bull QA Server - Code for a small app that provides the
Questioning Authority normalization layer
bull Linked Data Authorities - Configurations that can
be used with QA Server
bull LD4L Services - UI access to Cache Server
bull VitroLib - Code for the VitroLib cataloging tool
Multi-Server Architecture
QA ndash normalize RDF returned from an authority
Multi-Server Architecture
QA ndash normalize RDF returned from an authority
Direct Access
of External
Authority
HyraxVitrolib ndash UI for selecting an entry from an authority
Multi-Server Architecture
QA ndash normalize RDF returned from an authority
httplocalhost3000qasearchlinked_data
oclc_fastpersonal_nameq=twainamp
maximumRecords=2
Direct Access
of External
Authority
HyraxVitrolib ndash UI for selecting an entry from an authority
Multi-Server Architecture
QA ndash normalize RDF returned from an authority
httplocalhost3000qasearchlinked_data
oclc_fastpersonal_nameq=twainamp
maximumRecords=2
httpexperimentalworldcatorgfast
searchquery=oclcpersonalName+22twain22
ampsortKeys=usageampmaximumRecords=2
Direct Access
of External
Authority
HyraxVitrolib ndash UI for selecting an entry from an authority
Multi-Server Architecture
QA ndash normalize RDF returned from an authority
httplocalhost3000qasearchlinked_data
oclc_fastpersonal_nameq=twainamp
maximumRecords=2
httpexperimentalworldcatorgfast
searchquery=oclcpersonalName+22twain22
ampsortKeys=usageampmaximumRecords=2
lthttpidworldcatorgfast31622gt
a schemaPerson
dctermsidentifier 31622
skosprefLabel Twain Mark 1835-1910
skosaltLabel Make Teviin 1835-1910
Make Tuwen 1835-1910
lthttpidworldcatorgfast365563gt
a schemaPerson
dctermsidentifier 365563
skosprefLabel Twain Shania
skosaltLabel Twain Eilleen
Edwards Eilleen
Direct Access
of External
Authority
HyraxVitrolib ndash UI for selecting an entry from an authority
Multi-Server Architecture
QA ndash normalize RDF returned from an authority
httplocalhost3000qasearchlinked_data
oclc_fastpersonal_nameq=twainamp
maximumRecords=2
[urihttpidworldcatorgfast31622
id31622 labelTwain Mark 1835-1910
urihttpidworldcatorgfast365563
id365563labelTwain Shania ]
httpexperimentalworldcatorgfast
searchquery=oclcpersonalName+22twain22
ampsortKeys=usageampmaximumRecords=2
lthttpidworldcatorgfast31622gt
a schemaPerson
dctermsidentifier 31622
skosprefLabel Twain Mark 1835-1910
skosaltLabel Make Teviin 1835-1910
Make Tuwen 1835-1910
lthttpidworldcatorgfast365563gt
a schemaPerson
dctermsidentifier 365563
skosprefLabel Twain Shania
skosaltLabel Twain Eilleen
Edwards Eilleen
Direct Access
of External
Authority
HyraxVitrolib ndash UI for selecting an entry from an authority
Direct Access Query API
Direct against authorityhellip
httpexperimentalworldcatorgfastsearch
query=oclcpersonalName+22twain22
ampmaximumRecords=2
httpapigeonamesorgsearch
q=ithaca
ampmaxRows=2
ampusername=demo
amptype=rdf
httpartemideartuniroma2it8081agrovocrestv1search
query=milk
amplang=en
ampmaxhits=2
Normalized Query API
Through QA normalization layerhellip
httplocalhost3000qasearchlinked_dataoclc_fast
q=twain
ampmaxRecords=2
httplocalhost3000qasearchlinked_datageonames
q=ithaca
ampmaxRecords=2
httplocalhost3000qasearchlinked_dataagrovoc
q=milk
ampmaxRecords=2
amplang=en
Normalized Results
[urihttpidworldcatorgfast31622 id31622 labelTwain Mark 1835-1910 urihttpidworldcatorgfast365563 id365563 labelTwain Shania]
[uri httpswsgeonamesorg2162552 id httpswsgeonamesorg2162552 label Ithaca (AU) uri httpswsgeonamesorg4515289 id httpswsgeonamesorg4515289 label Ithaca (US)]
[uri httpaimsfaoorgaosagrovocc_8602 id httpaimsfaoorgaosagrovocc_8602 label acidophilus milk uri httpaimsfaoorgaosagrovocc_16076 id httpaimsfaoorgaosagrovocc_16076 label buffalo milk]
OCLC FAST GeoNames AgroVoc
Second Set of Challenges
5 Reliability amp Efficiency eg server uptime server load
6 Accuracy eg select results based on usage data lexical match
custom weighting other
7 Order Ranking eg How to order a graph
Cache Server Query Process
JSP Query API
Jena-Fuseki
Triplestore
One full setup per authority
LuceneSOLR
Index
Cache Server Query Process
JSP Query API
httpservicesld4lorgld4l_servicesloc_name_batchjspquery=ezra20cornellampmaxRecords=10
Jena-Fuseki
Triplestore
LuceneSOLR
Index
One full setup per authority
Cache Server Query Process
JSP Query API
lucene search for ezra cornell
index built with predicate values
ltskosprefLabelgt
ltskosaltLabelgt
Jena-Fuseki
Triplestore
LuceneSOLR
Index
httpservicesld4lorgld4l_servicesloc_name_batchjspquery=ezra20cornellampmaxRecords=10
One full setup per authority
Cache Server Query Process
JSP Query API
extract search rank
extract URI
Jena-Fuseki
Triplestore
for each result
LuceneSOLR
Index
lucene search for ezra cornell
index built with predicate values
ltskosprefLabelgt
ltskosaltLabelgt
httpservicesld4lorgld4l_servicesloc_name_batchjspquery=ezra20cornellampmaxRecords=10
One full setup per authority
Cache Server Query Process
JSP Query API
sparql query for URI
Jena-Fuseki
Triplestore
LuceneSOLR
Index
extract search rank
extract URI
for each result
lucene search for ezra cornell
index built with predicate values
ltskosprefLabelgt
ltskosaltLabelgt
httpservicesld4lorgld4l_servicesloc_name_batchjspquery=ezra20cornellampmaxRecords=10
One full setup per authority
Cache Server Query Process
JSP Query API
combine all results
Jena-Fuseki
Triplestore
insert search rank in predicate
lthttpvivoweborgontology
corerankgt
LuceneSOLR
Index
sparql query for URI extract search rank
extract URI
for each result
lucene search for ezra cornell
index built with predicate values
ltskosprefLabelgt
ltskosaltLabelgt
httpservicesld4lorgld4l_servicesloc_name_batchjspquery=ezra20cornellampmaxRecords=10
One full setup per authority
UI-QA-Authority
QA ndash normalize RDF returned from an authority
httplocalhost3000qasearchlinked_data
oclc_fastpersonal_nameq=twainampmaximumRecords=2
[urihttpidworldcatorgfast31622id31622
labelTwain Mark 1835-1910
urihttpidworldcatorgfast365563id365563
labelTwain Shania
httpexperimentalworldcatorgfastsearchquery=o
clcpersonalName+22twain22
ampsortKeys=usageampmaximumRecords=2
RDF of
search
results
Active-Triples
LDF Cache
(Marmotta or
Blazegraph) LDF Cache Jena-Fuseki-
Lucene
Cache
Direct Access
of External
Authority
HyraxVitrolib ndash UI for selecting an entry from an authority
search of cache performed via Lucene index
Third Set of Challenges
8 Disambiguation through better context eg expand from just prefLabel tohellip
preLabel altLabel birthdeath dates occupation etc
9 Reconciliation across multiple sources eg match LoC URI to OCLC FAST URI
53
Whatrsquos next
Addressing Architectural Challenges
bull Generalize process for accessing context on the
cache server and in the normalization layer
bull Multi-authority search and reconciliation
bull Address the need for cache refresh
bull Mirrored cache servers
User Experience and Design
bull User-centered Design
bull Observe listen learn design evaluate iterate
bull Iteratively design and evaluate UI for lookupauthorities
with catalogers
bull Search result rankingorderingfiltering for catalogers
bull Additional UI platforms eg FOLIO
56
Questions
httptinyurlcomld4l-auth-access
Appendix for Challenges 1-4
Challenge 1 Documentation
58
LoC httpidlocgovtechcenter
C Harlow notes on reconciling LoC - httpsgithubcomcmh2166lc-reconcile
OCLC FAST
httpswwwoclcorgdeveloperdevelopweb-servicesfast-apilinked-dataenhtml
GeoNames
httpwwwgeonamesorgexportgeonames-searchhtml
AGROVOC httpaimsfaoorgvest-registryvocabulariesagrovoc-multilingual-agricultural-thesaurus
swagger config httpsgithubcomNatLibFiSkosmosblobmasterswaggerjson
NALT
httpsagclassnalusdagov
DBpedia httpwikidbpediaorgOnlineAccess1220Public20Faceted20Web20Service20Inter
face
Challenge 2 Linked Data Access API
59
for Search Query for Term Fetch
LoC not supported URI
OCLC FAST httpexperimentalworldcatorgfastsearchq
uery=subauth+all+22query22ampsortK
eys=usageampmaximumRecords=maximumR
ecords
URI
GeoNames httpapigeonamesorgsearchq=queryamp
maxRows=maxRowsampusername=userna
meamptype=rdf
URI
AGROVOC httpartemideartuniroma2it8081agrovocr
estv1searchquery=queryamplang=lang
httpartemideartuniroma2it8081agrovo
crestv1datauri=httpaimsfaoorgaosa
grovocterm_id
NALT httpskosmoslibrarycornelledurestv1nalt
searchquery=queryamplang=lang
httpskosmoslibrarycornelledurestv1na
ltdatauri=term_uri
DBpedia
Challenge 3 Varying Results Formats
60
for Search Query for Term Fetch
LoC not supported rdf-xml
OCLC FAST rdf-xml rdf-xml
GeoNames rdf-xml rdf-xml
AGROVOC json-ld rdf-xml json-ld turtle
NALT json-ld rdf-xml json-ld turtle
DBpedia
Challenge 4 Varying Ontologies
61
Primary Ontology Flat vs Navigation
required
LoC madsrdf
SKOS
navigation required
OCLC FAST schemaorg
SKOS
flat
GeoNames geonames flat
hierarchical
AGROVOC SKOS flat
hierarchical
NALT SKOS flat
hierarchical
DBpedia dbpedia flat
Configurations for Questioning Authority
62
LoC httpsgithubcomld4l-
labslinked_data_authoritiestreemasterqa_locconfigauthoritieslinked_dat
a
OCLC FAST httpsgithubcomld4l-
labslinked_data_authoritiestreemasterqa_oclcfastconfigauthoritieslinked
_data
GeoNames httpsgithubcomld4l-
labslinked_data_authoritiestreemasterqa_geonamesconfigauthoritieslink
ed_data
AGROVOC httpsgithubcomld4l-
labslinked_data_authoritiestreemasterqa_agrovocconfigauthoritieslinked
_data
NALT httpsgithubcomld4l-
labslinked_data_authoritiestreemasterqa_naltconfigauthoritieslinked_dat
a
DBpedia httpsgithubcomld4l-
labslinked_data_authoritiestreemasterqa_dbpediaconfigauthoritieslinked
_data
Appendix for Challenges 5-7
Creating a Cache Server
Hardware
bull 8-core 64gb 3Ghz Mac Pro (late 2013) macOS Sierra
(10126)
bull 32tb Pegasus-2 Thunderbolt RAID configured as RAID-5
Triplestore
bull Apache Jena Fuseki 240 provides SPARQL endpoint
bull Apache Tomcat 90 runs custom web application(s)
bull Apache Lucene 36 provides search interface
64
Customizations
bull custom per-data-source JSP web application provides
searchbrowsedownload functionality
bull custom (generic) SPARQL Tag Library provides API for web
apps (available at httpsgithubcomeichmannlod-utilities)
bull custom (generic) Lucene Tag Library provides API for web apps
65
Loading a New Vocabulary
bull download RDF
bull if necessary convert to n-triples (required for GeoNames data for instance)
bull use tdbloader2 to populated triplestore
bull configure Fuseki server(s) with triplestore details
bull create new JSP project in Eclipse
bull write one or more indexer programs that populate Lucene indices and run indexer(s)
bull write searchbrowsedownload application logic using the SPARQL and Lucene tags
bull package project as war
bull deploy to Apache Tomcat server(s)
bull add new service to Apache HTTPD virtual host specification
66
UI Access to Cache Server
httpservicesld4lorgld4l_servicesloc_namejsp
Downloads
68
LoC httpidlocgovdownload (n-triples OR rdf-xml)
OCLC FAST httpwwwoclcorgresearchthemesdata-sciencefastdownloadhtml (n-triples)
GeoNames httpwwwgeonamesorgontologydocumentationhtml (custom format ndash see notes for processing)
AGROVOC httpsaims-faoatlassiannetwikispacesAGVpages2949126Releases (n-triples OR rdf-xml)
NALT httpsagclassnalusdagovdownloadshtml (rdf-xml)
DBpedia httpwikidbpediaorgdownloads-2016-04
Potential Options for Reconciliation
bull VIAF for name reconciliation ndash we are doing some
work with this
bull Wikidata ndash Ive heard that they are working on
Reconciliation issues but havent yet explored in
depth bull Intro Video (3hrs)
bull API Access
bull SPARQL ndash user manual
bull federated queries with other authorities
Doing a google search for linked data reconciliation
returns a large number of articles and presentations
on this concept
Links to Code amp More
bull QA Server - Code for a small app that provides the
Questioning Authority normalization layer
bull Linked Data Authorities - Configurations that can
be used with QA Server
bull LD4L Services - UI access to Cache Server
bull VitroLib - Code for the VitroLib cataloging tool
Multi-Server Architecture
QA ndash normalize RDF returned from an authority
Direct Access
of External
Authority
HyraxVitrolib ndash UI for selecting an entry from an authority
Multi-Server Architecture
QA ndash normalize RDF returned from an authority
httplocalhost3000qasearchlinked_data
oclc_fastpersonal_nameq=twainamp
maximumRecords=2
Direct Access
of External
Authority
HyraxVitrolib ndash UI for selecting an entry from an authority
Multi-Server Architecture
QA ndash normalize RDF returned from an authority
httplocalhost3000qasearchlinked_data
oclc_fastpersonal_nameq=twainamp
maximumRecords=2
httpexperimentalworldcatorgfast
searchquery=oclcpersonalName+22twain22
ampsortKeys=usageampmaximumRecords=2
Direct Access
of External
Authority
HyraxVitrolib ndash UI for selecting an entry from an authority
Multi-Server Architecture
QA ndash normalize RDF returned from an authority
httplocalhost3000qasearchlinked_data
oclc_fastpersonal_nameq=twainamp
maximumRecords=2
httpexperimentalworldcatorgfast
searchquery=oclcpersonalName+22twain22
ampsortKeys=usageampmaximumRecords=2
lthttpidworldcatorgfast31622gt
a schemaPerson
dctermsidentifier 31622
skosprefLabel Twain Mark 1835-1910
skosaltLabel Make Teviin 1835-1910
Make Tuwen 1835-1910
lthttpidworldcatorgfast365563gt
a schemaPerson
dctermsidentifier 365563
skosprefLabel Twain Shania
skosaltLabel Twain Eilleen
Edwards Eilleen
Direct Access
of External
Authority
HyraxVitrolib ndash UI for selecting an entry from an authority
Multi-Server Architecture
QA ndash normalize RDF returned from an authority
httplocalhost3000qasearchlinked_data
oclc_fastpersonal_nameq=twainamp
maximumRecords=2
[urihttpidworldcatorgfast31622
id31622 labelTwain Mark 1835-1910
urihttpidworldcatorgfast365563
id365563labelTwain Shania ]
httpexperimentalworldcatorgfast
searchquery=oclcpersonalName+22twain22
ampsortKeys=usageampmaximumRecords=2
lthttpidworldcatorgfast31622gt
a schemaPerson
dctermsidentifier 31622
skosprefLabel Twain Mark 1835-1910
skosaltLabel Make Teviin 1835-1910
Make Tuwen 1835-1910
lthttpidworldcatorgfast365563gt
a schemaPerson
dctermsidentifier 365563
skosprefLabel Twain Shania
skosaltLabel Twain Eilleen
Edwards Eilleen
Direct Access
of External
Authority
HyraxVitrolib ndash UI for selecting an entry from an authority
Direct Access Query API
Direct against authorityhellip
httpexperimentalworldcatorgfastsearch
query=oclcpersonalName+22twain22
ampmaximumRecords=2
httpapigeonamesorgsearch
q=ithaca
ampmaxRows=2
ampusername=demo
amptype=rdf
httpartemideartuniroma2it8081agrovocrestv1search
query=milk
amplang=en
ampmaxhits=2
Normalized Query API
Through QA normalization layerhellip
httplocalhost3000qasearchlinked_dataoclc_fast
q=twain
ampmaxRecords=2
httplocalhost3000qasearchlinked_datageonames
q=ithaca
ampmaxRecords=2
httplocalhost3000qasearchlinked_dataagrovoc
q=milk
ampmaxRecords=2
amplang=en
Normalized Results
[urihttpidworldcatorgfast31622 id31622 labelTwain Mark 1835-1910 urihttpidworldcatorgfast365563 id365563 labelTwain Shania]
[uri httpswsgeonamesorg2162552 id httpswsgeonamesorg2162552 label Ithaca (AU) uri httpswsgeonamesorg4515289 id httpswsgeonamesorg4515289 label Ithaca (US)]
[uri httpaimsfaoorgaosagrovocc_8602 id httpaimsfaoorgaosagrovocc_8602 label acidophilus milk uri httpaimsfaoorgaosagrovocc_16076 id httpaimsfaoorgaosagrovocc_16076 label buffalo milk]
OCLC FAST GeoNames AgroVoc
Second Set of Challenges
5 Reliability amp Efficiency eg server uptime server load
6 Accuracy eg select results based on usage data lexical match
custom weighting other
7 Order Ranking eg How to order a graph
Cache Server Query Process
JSP Query API
Jena-Fuseki
Triplestore
One full setup per authority
LuceneSOLR
Index
Cache Server Query Process
JSP Query API
httpservicesld4lorgld4l_servicesloc_name_batchjspquery=ezra20cornellampmaxRecords=10
Jena-Fuseki
Triplestore
LuceneSOLR
Index
One full setup per authority
Cache Server Query Process
JSP Query API
lucene search for ezra cornell
index built with predicate values
ltskosprefLabelgt
ltskosaltLabelgt
Jena-Fuseki
Triplestore
LuceneSOLR
Index
httpservicesld4lorgld4l_servicesloc_name_batchjspquery=ezra20cornellampmaxRecords=10
One full setup per authority
Cache Server Query Process
JSP Query API
extract search rank
extract URI
Jena-Fuseki
Triplestore
for each result
LuceneSOLR
Index
lucene search for ezra cornell
index built with predicate values
ltskosprefLabelgt
ltskosaltLabelgt
httpservicesld4lorgld4l_servicesloc_name_batchjspquery=ezra20cornellampmaxRecords=10
One full setup per authority
Cache Server Query Process
JSP Query API
sparql query for URI
Jena-Fuseki
Triplestore
LuceneSOLR
Index
extract search rank
extract URI
for each result
lucene search for ezra cornell
index built with predicate values
ltskosprefLabelgt
ltskosaltLabelgt
httpservicesld4lorgld4l_servicesloc_name_batchjspquery=ezra20cornellampmaxRecords=10
One full setup per authority
Cache Server Query Process
JSP Query API
combine all results
Jena-Fuseki
Triplestore
insert search rank in predicate
lthttpvivoweborgontology
corerankgt
LuceneSOLR
Index
sparql query for URI extract search rank
extract URI
for each result
lucene search for ezra cornell
index built with predicate values
ltskosprefLabelgt
ltskosaltLabelgt
httpservicesld4lorgld4l_servicesloc_name_batchjspquery=ezra20cornellampmaxRecords=10
One full setup per authority
UI-QA-Authority
QA ndash normalize RDF returned from an authority
httplocalhost3000qasearchlinked_data
oclc_fastpersonal_nameq=twainampmaximumRecords=2
[urihttpidworldcatorgfast31622id31622
labelTwain Mark 1835-1910
urihttpidworldcatorgfast365563id365563
labelTwain Shania
httpexperimentalworldcatorgfastsearchquery=o
clcpersonalName+22twain22
ampsortKeys=usageampmaximumRecords=2
RDF of
search
results
Active-Triples
LDF Cache
(Marmotta or
Blazegraph) LDF Cache Jena-Fuseki-
Lucene
Cache
Direct Access
of External
Authority
HyraxVitrolib ndash UI for selecting an entry from an authority
search of cache performed via Lucene index
Third Set of Challenges
8 Disambiguation through better context eg expand from just prefLabel tohellip
preLabel altLabel birthdeath dates occupation etc
9 Reconciliation across multiple sources eg match LoC URI to OCLC FAST URI
53
Whatrsquos next
Addressing Architectural Challenges
bull Generalize process for accessing context on the
cache server and in the normalization layer
bull Multi-authority search and reconciliation
bull Address the need for cache refresh
bull Mirrored cache servers
User Experience and Design
bull User-centered Design
bull Observe listen learn design evaluate iterate
bull Iteratively design and evaluate UI for lookupauthorities
with catalogers
bull Search result rankingorderingfiltering for catalogers
bull Additional UI platforms eg FOLIO
56
Questions
httptinyurlcomld4l-auth-access
Appendix for Challenges 1-4
Challenge 1 Documentation
58
LoC httpidlocgovtechcenter
C Harlow notes on reconciling LoC - httpsgithubcomcmh2166lc-reconcile
OCLC FAST
httpswwwoclcorgdeveloperdevelopweb-servicesfast-apilinked-dataenhtml
GeoNames
httpwwwgeonamesorgexportgeonames-searchhtml
AGROVOC httpaimsfaoorgvest-registryvocabulariesagrovoc-multilingual-agricultural-thesaurus
swagger config httpsgithubcomNatLibFiSkosmosblobmasterswaggerjson
NALT
httpsagclassnalusdagov
DBpedia httpwikidbpediaorgOnlineAccess1220Public20Faceted20Web20Service20Inter
face
Challenge 2 Linked Data Access API
59
for Search Query for Term Fetch
LoC not supported URI
OCLC FAST httpexperimentalworldcatorgfastsearchq
uery=subauth+all+22query22ampsortK
eys=usageampmaximumRecords=maximumR
ecords
URI
GeoNames httpapigeonamesorgsearchq=queryamp
maxRows=maxRowsampusername=userna
meamptype=rdf
URI
AGROVOC httpartemideartuniroma2it8081agrovocr
estv1searchquery=queryamplang=lang
httpartemideartuniroma2it8081agrovo
crestv1datauri=httpaimsfaoorgaosa
grovocterm_id
NALT httpskosmoslibrarycornelledurestv1nalt
searchquery=queryamplang=lang
httpskosmoslibrarycornelledurestv1na
ltdatauri=term_uri
DBpedia
Challenge 3 Varying Results Formats
60
for Search Query for Term Fetch
LoC not supported rdf-xml
OCLC FAST rdf-xml rdf-xml
GeoNames rdf-xml rdf-xml
AGROVOC json-ld rdf-xml json-ld turtle
NALT json-ld rdf-xml json-ld turtle
DBpedia
Challenge 4 Varying Ontologies
61
Primary Ontology Flat vs Navigation
required
LoC madsrdf
SKOS
navigation required
OCLC FAST schemaorg
SKOS
flat
GeoNames geonames flat
hierarchical
AGROVOC SKOS flat
hierarchical
NALT SKOS flat
hierarchical
DBpedia dbpedia flat
Configurations for Questioning Authority
62
LoC httpsgithubcomld4l-
labslinked_data_authoritiestreemasterqa_locconfigauthoritieslinked_dat
a
OCLC FAST httpsgithubcomld4l-
labslinked_data_authoritiestreemasterqa_oclcfastconfigauthoritieslinked
_data
GeoNames httpsgithubcomld4l-
labslinked_data_authoritiestreemasterqa_geonamesconfigauthoritieslink
ed_data
AGROVOC httpsgithubcomld4l-
labslinked_data_authoritiestreemasterqa_agrovocconfigauthoritieslinked
_data
NALT httpsgithubcomld4l-
labslinked_data_authoritiestreemasterqa_naltconfigauthoritieslinked_dat
a
DBpedia httpsgithubcomld4l-
labslinked_data_authoritiestreemasterqa_dbpediaconfigauthoritieslinked
_data
Appendix for Challenges 5-7
Creating a Cache Server
Hardware
bull 8-core 64gb 3Ghz Mac Pro (late 2013) macOS Sierra
(10126)
bull 32tb Pegasus-2 Thunderbolt RAID configured as RAID-5
Triplestore
bull Apache Jena Fuseki 240 provides SPARQL endpoint
bull Apache Tomcat 90 runs custom web application(s)
bull Apache Lucene 36 provides search interface
64
Customizations
bull custom per-data-source JSP web application provides
searchbrowsedownload functionality
bull custom (generic) SPARQL Tag Library provides API for web
apps (available at httpsgithubcomeichmannlod-utilities)
bull custom (generic) Lucene Tag Library provides API for web apps
65
Loading a New Vocabulary
bull download RDF
bull if necessary convert to n-triples (required for GeoNames data for instance)
bull use tdbloader2 to populated triplestore
bull configure Fuseki server(s) with triplestore details
bull create new JSP project in Eclipse
bull write one or more indexer programs that populate Lucene indices and run indexer(s)
bull write searchbrowsedownload application logic using the SPARQL and Lucene tags
bull package project as war
bull deploy to Apache Tomcat server(s)
bull add new service to Apache HTTPD virtual host specification
66
UI Access to Cache Server
httpservicesld4lorgld4l_servicesloc_namejsp
Downloads
68
LoC httpidlocgovdownload (n-triples OR rdf-xml)
OCLC FAST httpwwwoclcorgresearchthemesdata-sciencefastdownloadhtml (n-triples)
GeoNames httpwwwgeonamesorgontologydocumentationhtml (custom format ndash see notes for processing)
AGROVOC httpsaims-faoatlassiannetwikispacesAGVpages2949126Releases (n-triples OR rdf-xml)
NALT httpsagclassnalusdagovdownloadshtml (rdf-xml)
DBpedia httpwikidbpediaorgdownloads-2016-04
Potential Options for Reconciliation
bull VIAF for name reconciliation ndash we are doing some
work with this
bull Wikidata ndash Ive heard that they are working on
Reconciliation issues but havent yet explored in
depth bull Intro Video (3hrs)
bull API Access
bull SPARQL ndash user manual
bull federated queries with other authorities
Doing a google search for linked data reconciliation
returns a large number of articles and presentations
on this concept
Links to Code amp More
bull QA Server - Code for a small app that provides the
Questioning Authority normalization layer
bull Linked Data Authorities - Configurations that can
be used with QA Server
bull LD4L Services - UI access to Cache Server
bull VitroLib - Code for the VitroLib cataloging tool
Multi-Server Architecture
QA ndash normalize RDF returned from an authority
httplocalhost3000qasearchlinked_data
oclc_fastpersonal_nameq=twainamp
maximumRecords=2
Direct Access
of External
Authority
HyraxVitrolib ndash UI for selecting an entry from an authority
Multi-Server Architecture
QA ndash normalize RDF returned from an authority
httplocalhost3000qasearchlinked_data
oclc_fastpersonal_nameq=twainamp
maximumRecords=2
httpexperimentalworldcatorgfast
searchquery=oclcpersonalName+22twain22
ampsortKeys=usageampmaximumRecords=2
Direct Access
of External
Authority
HyraxVitrolib ndash UI for selecting an entry from an authority
Multi-Server Architecture
QA ndash normalize RDF returned from an authority
httplocalhost3000qasearchlinked_data
oclc_fastpersonal_nameq=twainamp
maximumRecords=2
httpexperimentalworldcatorgfast
searchquery=oclcpersonalName+22twain22
ampsortKeys=usageampmaximumRecords=2
lthttpidworldcatorgfast31622gt
a schemaPerson
dctermsidentifier 31622
skosprefLabel Twain Mark 1835-1910
skosaltLabel Make Teviin 1835-1910
Make Tuwen 1835-1910
lthttpidworldcatorgfast365563gt
a schemaPerson
dctermsidentifier 365563
skosprefLabel Twain Shania
skosaltLabel Twain Eilleen
Edwards Eilleen
Direct Access
of External
Authority
HyraxVitrolib ndash UI for selecting an entry from an authority
Multi-Server Architecture
QA ndash normalize RDF returned from an authority
httplocalhost3000qasearchlinked_data
oclc_fastpersonal_nameq=twainamp
maximumRecords=2
[urihttpidworldcatorgfast31622
id31622 labelTwain Mark 1835-1910
urihttpidworldcatorgfast365563
id365563labelTwain Shania ]
httpexperimentalworldcatorgfast
searchquery=oclcpersonalName+22twain22
ampsortKeys=usageampmaximumRecords=2
lthttpidworldcatorgfast31622gt
a schemaPerson
dctermsidentifier 31622
skosprefLabel Twain Mark 1835-1910
skosaltLabel Make Teviin 1835-1910
Make Tuwen 1835-1910
lthttpidworldcatorgfast365563gt
a schemaPerson
dctermsidentifier 365563
skosprefLabel Twain Shania
skosaltLabel Twain Eilleen
Edwards Eilleen
Direct Access
of External
Authority
HyraxVitrolib ndash UI for selecting an entry from an authority
Direct Access Query API
Direct against authorityhellip
httpexperimentalworldcatorgfastsearch
query=oclcpersonalName+22twain22
ampmaximumRecords=2
httpapigeonamesorgsearch
q=ithaca
ampmaxRows=2
ampusername=demo
amptype=rdf
httpartemideartuniroma2it8081agrovocrestv1search
query=milk
amplang=en
ampmaxhits=2
Normalized Query API
Through QA normalization layerhellip
httplocalhost3000qasearchlinked_dataoclc_fast
q=twain
ampmaxRecords=2
httplocalhost3000qasearchlinked_datageonames
q=ithaca
ampmaxRecords=2
httplocalhost3000qasearchlinked_dataagrovoc
q=milk
ampmaxRecords=2
amplang=en
Normalized Results
[urihttpidworldcatorgfast31622 id31622 labelTwain Mark 1835-1910 urihttpidworldcatorgfast365563 id365563 labelTwain Shania]
[uri httpswsgeonamesorg2162552 id httpswsgeonamesorg2162552 label Ithaca (AU) uri httpswsgeonamesorg4515289 id httpswsgeonamesorg4515289 label Ithaca (US)]
[uri httpaimsfaoorgaosagrovocc_8602 id httpaimsfaoorgaosagrovocc_8602 label acidophilus milk uri httpaimsfaoorgaosagrovocc_16076 id httpaimsfaoorgaosagrovocc_16076 label buffalo milk]
OCLC FAST GeoNames AgroVoc
Second Set of Challenges
5 Reliability amp Efficiency eg server uptime server load
6 Accuracy eg select results based on usage data lexical match
custom weighting other
7 Order Ranking eg How to order a graph
Cache Server Query Process
JSP Query API
Jena-Fuseki
Triplestore
One full setup per authority
LuceneSOLR
Index
Cache Server Query Process
JSP Query API
httpservicesld4lorgld4l_servicesloc_name_batchjspquery=ezra20cornellampmaxRecords=10
Jena-Fuseki
Triplestore
LuceneSOLR
Index
One full setup per authority
Cache Server Query Process
JSP Query API
lucene search for ezra cornell
index built with predicate values
ltskosprefLabelgt
ltskosaltLabelgt
Jena-Fuseki
Triplestore
LuceneSOLR
Index
httpservicesld4lorgld4l_servicesloc_name_batchjspquery=ezra20cornellampmaxRecords=10
One full setup per authority
Cache Server Query Process
JSP Query API
extract search rank
extract URI
Jena-Fuseki
Triplestore
for each result
LuceneSOLR
Index
lucene search for ezra cornell
index built with predicate values
ltskosprefLabelgt
ltskosaltLabelgt
httpservicesld4lorgld4l_servicesloc_name_batchjspquery=ezra20cornellampmaxRecords=10
One full setup per authority
Cache Server Query Process
JSP Query API
sparql query for URI
Jena-Fuseki
Triplestore
LuceneSOLR
Index
extract search rank
extract URI
for each result
lucene search for ezra cornell
index built with predicate values
ltskosprefLabelgt
ltskosaltLabelgt
httpservicesld4lorgld4l_servicesloc_name_batchjspquery=ezra20cornellampmaxRecords=10
One full setup per authority
Cache Server Query Process
JSP Query API
combine all results
Jena-Fuseki
Triplestore
insert search rank in predicate
lthttpvivoweborgontology
corerankgt
LuceneSOLR
Index
sparql query for URI extract search rank
extract URI
for each result
lucene search for ezra cornell
index built with predicate values
ltskosprefLabelgt
ltskosaltLabelgt
httpservicesld4lorgld4l_servicesloc_name_batchjspquery=ezra20cornellampmaxRecords=10
One full setup per authority
UI-QA-Authority
QA ndash normalize RDF returned from an authority
httplocalhost3000qasearchlinked_data
oclc_fastpersonal_nameq=twainampmaximumRecords=2
[urihttpidworldcatorgfast31622id31622
labelTwain Mark 1835-1910
urihttpidworldcatorgfast365563id365563
labelTwain Shania
httpexperimentalworldcatorgfastsearchquery=o
clcpersonalName+22twain22
ampsortKeys=usageampmaximumRecords=2
RDF of
search
results
Active-Triples
LDF Cache
(Marmotta or
Blazegraph) LDF Cache Jena-Fuseki-
Lucene
Cache
Direct Access
of External
Authority
HyraxVitrolib ndash UI for selecting an entry from an authority
search of cache performed via Lucene index
Third Set of Challenges
8 Disambiguation through better context eg expand from just prefLabel tohellip
preLabel altLabel birthdeath dates occupation etc
9 Reconciliation across multiple sources eg match LoC URI to OCLC FAST URI
53
Whatrsquos next
Addressing Architectural Challenges
bull Generalize process for accessing context on the
cache server and in the normalization layer
bull Multi-authority search and reconciliation
bull Address the need for cache refresh
bull Mirrored cache servers
User Experience and Design
bull User-centered Design
bull Observe listen learn design evaluate iterate
bull Iteratively design and evaluate UI for lookupauthorities
with catalogers
bull Search result rankingorderingfiltering for catalogers
bull Additional UI platforms eg FOLIO
56
Questions
httptinyurlcomld4l-auth-access
Appendix for Challenges 1-4
Challenge 1 Documentation
58
LoC httpidlocgovtechcenter
C Harlow notes on reconciling LoC - httpsgithubcomcmh2166lc-reconcile
OCLC FAST
httpswwwoclcorgdeveloperdevelopweb-servicesfast-apilinked-dataenhtml
GeoNames
httpwwwgeonamesorgexportgeonames-searchhtml
AGROVOC httpaimsfaoorgvest-registryvocabulariesagrovoc-multilingual-agricultural-thesaurus
swagger config httpsgithubcomNatLibFiSkosmosblobmasterswaggerjson
NALT
httpsagclassnalusdagov
DBpedia httpwikidbpediaorgOnlineAccess1220Public20Faceted20Web20Service20Inter
face
Challenge 2 Linked Data Access API
59
for Search Query for Term Fetch
LoC not supported URI
OCLC FAST httpexperimentalworldcatorgfastsearchq
uery=subauth+all+22query22ampsortK
eys=usageampmaximumRecords=maximumR
ecords
URI
GeoNames httpapigeonamesorgsearchq=queryamp
maxRows=maxRowsampusername=userna
meamptype=rdf
URI
AGROVOC httpartemideartuniroma2it8081agrovocr
estv1searchquery=queryamplang=lang
httpartemideartuniroma2it8081agrovo
crestv1datauri=httpaimsfaoorgaosa
grovocterm_id
NALT httpskosmoslibrarycornelledurestv1nalt
searchquery=queryamplang=lang
httpskosmoslibrarycornelledurestv1na
ltdatauri=term_uri
DBpedia
Challenge 3 Varying Results Formats
60
for Search Query for Term Fetch
LoC not supported rdf-xml
OCLC FAST rdf-xml rdf-xml
GeoNames rdf-xml rdf-xml
AGROVOC json-ld rdf-xml json-ld turtle
NALT json-ld rdf-xml json-ld turtle
DBpedia
Challenge 4 Varying Ontologies
61
Primary Ontology Flat vs Navigation
required
LoC madsrdf
SKOS
navigation required
OCLC FAST schemaorg
SKOS
flat
GeoNames geonames flat
hierarchical
AGROVOC SKOS flat
hierarchical
NALT SKOS flat
hierarchical
DBpedia dbpedia flat
Configurations for Questioning Authority
62
LoC httpsgithubcomld4l-
labslinked_data_authoritiestreemasterqa_locconfigauthoritieslinked_dat
a
OCLC FAST httpsgithubcomld4l-
labslinked_data_authoritiestreemasterqa_oclcfastconfigauthoritieslinked
_data
GeoNames httpsgithubcomld4l-
labslinked_data_authoritiestreemasterqa_geonamesconfigauthoritieslink
ed_data
AGROVOC httpsgithubcomld4l-
labslinked_data_authoritiestreemasterqa_agrovocconfigauthoritieslinked
_data
NALT httpsgithubcomld4l-
labslinked_data_authoritiestreemasterqa_naltconfigauthoritieslinked_dat
a
DBpedia httpsgithubcomld4l-
labslinked_data_authoritiestreemasterqa_dbpediaconfigauthoritieslinked
_data
Appendix for Challenges 5-7
Creating a Cache Server
Hardware
bull 8-core 64gb 3Ghz Mac Pro (late 2013) macOS Sierra
(10126)
bull 32tb Pegasus-2 Thunderbolt RAID configured as RAID-5
Triplestore
bull Apache Jena Fuseki 240 provides SPARQL endpoint
bull Apache Tomcat 90 runs custom web application(s)
bull Apache Lucene 36 provides search interface
64
Customizations
bull custom per-data-source JSP web application provides
searchbrowsedownload functionality
bull custom (generic) SPARQL Tag Library provides API for web
apps (available at httpsgithubcomeichmannlod-utilities)
bull custom (generic) Lucene Tag Library provides API for web apps
65
Loading a New Vocabulary
bull download RDF
bull if necessary convert to n-triples (required for GeoNames data for instance)
bull use tdbloader2 to populated triplestore
bull configure Fuseki server(s) with triplestore details
bull create new JSP project in Eclipse
bull write one or more indexer programs that populate Lucene indices and run indexer(s)
bull write searchbrowsedownload application logic using the SPARQL and Lucene tags
bull package project as war
bull deploy to Apache Tomcat server(s)
bull add new service to Apache HTTPD virtual host specification
66
UI Access to Cache Server
httpservicesld4lorgld4l_servicesloc_namejsp
Downloads
68
LoC httpidlocgovdownload (n-triples OR rdf-xml)
OCLC FAST httpwwwoclcorgresearchthemesdata-sciencefastdownloadhtml (n-triples)
GeoNames httpwwwgeonamesorgontologydocumentationhtml (custom format ndash see notes for processing)
AGROVOC httpsaims-faoatlassiannetwikispacesAGVpages2949126Releases (n-triples OR rdf-xml)
NALT httpsagclassnalusdagovdownloadshtml (rdf-xml)
DBpedia httpwikidbpediaorgdownloads-2016-04
Potential Options for Reconciliation
bull VIAF for name reconciliation ndash we are doing some
work with this
bull Wikidata ndash Ive heard that they are working on
Reconciliation issues but havent yet explored in
depth bull Intro Video (3hrs)
bull API Access
bull SPARQL ndash user manual
bull federated queries with other authorities
Doing a google search for linked data reconciliation
returns a large number of articles and presentations
on this concept
Links to Code amp More
bull QA Server - Code for a small app that provides the
Questioning Authority normalization layer
bull Linked Data Authorities - Configurations that can
be used with QA Server
bull LD4L Services - UI access to Cache Server
bull VitroLib - Code for the VitroLib cataloging tool
Multi-Server Architecture
QA ndash normalize RDF returned from an authority
httplocalhost3000qasearchlinked_data
oclc_fastpersonal_nameq=twainamp
maximumRecords=2
httpexperimentalworldcatorgfast
searchquery=oclcpersonalName+22twain22
ampsortKeys=usageampmaximumRecords=2
Direct Access
of External
Authority
HyraxVitrolib ndash UI for selecting an entry from an authority
Multi-Server Architecture
QA ndash normalize RDF returned from an authority
httplocalhost3000qasearchlinked_data
oclc_fastpersonal_nameq=twainamp
maximumRecords=2
httpexperimentalworldcatorgfast
searchquery=oclcpersonalName+22twain22
ampsortKeys=usageampmaximumRecords=2
lthttpidworldcatorgfast31622gt
a schemaPerson
dctermsidentifier 31622
skosprefLabel Twain Mark 1835-1910
skosaltLabel Make Teviin 1835-1910
Make Tuwen 1835-1910
lthttpidworldcatorgfast365563gt
a schemaPerson
dctermsidentifier 365563
skosprefLabel Twain Shania
skosaltLabel Twain Eilleen
Edwards Eilleen
Direct Access
of External
Authority
HyraxVitrolib ndash UI for selecting an entry from an authority
Multi-Server Architecture
QA ndash normalize RDF returned from an authority
httplocalhost3000qasearchlinked_data
oclc_fastpersonal_nameq=twainamp
maximumRecords=2
[urihttpidworldcatorgfast31622
id31622 labelTwain Mark 1835-1910
urihttpidworldcatorgfast365563
id365563labelTwain Shania ]
httpexperimentalworldcatorgfast
searchquery=oclcpersonalName+22twain22
ampsortKeys=usageampmaximumRecords=2
lthttpidworldcatorgfast31622gt
a schemaPerson
dctermsidentifier 31622
skosprefLabel Twain Mark 1835-1910
skosaltLabel Make Teviin 1835-1910
Make Tuwen 1835-1910
lthttpidworldcatorgfast365563gt
a schemaPerson
dctermsidentifier 365563
skosprefLabel Twain Shania
skosaltLabel Twain Eilleen
Edwards Eilleen
Direct Access
of External
Authority
HyraxVitrolib ndash UI for selecting an entry from an authority
Direct Access Query API
Direct against authorityhellip
httpexperimentalworldcatorgfastsearch
query=oclcpersonalName+22twain22
ampmaximumRecords=2
httpapigeonamesorgsearch
q=ithaca
ampmaxRows=2
ampusername=demo
amptype=rdf
httpartemideartuniroma2it8081agrovocrestv1search
query=milk
amplang=en
ampmaxhits=2
Normalized Query API
Through QA normalization layerhellip
httplocalhost3000qasearchlinked_dataoclc_fast
q=twain
ampmaxRecords=2
httplocalhost3000qasearchlinked_datageonames
q=ithaca
ampmaxRecords=2
httplocalhost3000qasearchlinked_dataagrovoc
q=milk
ampmaxRecords=2
amplang=en
Normalized Results
[urihttpidworldcatorgfast31622 id31622 labelTwain Mark 1835-1910 urihttpidworldcatorgfast365563 id365563 labelTwain Shania]
[uri httpswsgeonamesorg2162552 id httpswsgeonamesorg2162552 label Ithaca (AU) uri httpswsgeonamesorg4515289 id httpswsgeonamesorg4515289 label Ithaca (US)]
[uri httpaimsfaoorgaosagrovocc_8602 id httpaimsfaoorgaosagrovocc_8602 label acidophilus milk uri httpaimsfaoorgaosagrovocc_16076 id httpaimsfaoorgaosagrovocc_16076 label buffalo milk]
OCLC FAST GeoNames AgroVoc
Second Set of Challenges
5 Reliability amp Efficiency eg server uptime server load
6 Accuracy eg select results based on usage data lexical match
custom weighting other
7 Order Ranking eg How to order a graph
Cache Server Query Process
JSP Query API
Jena-Fuseki
Triplestore
One full setup per authority
LuceneSOLR
Index
Cache Server Query Process
JSP Query API
httpservicesld4lorgld4l_servicesloc_name_batchjspquery=ezra20cornellampmaxRecords=10
Jena-Fuseki
Triplestore
LuceneSOLR
Index
One full setup per authority
Cache Server Query Process
JSP Query API
lucene search for ezra cornell
index built with predicate values
ltskosprefLabelgt
ltskosaltLabelgt
Jena-Fuseki
Triplestore
LuceneSOLR
Index
httpservicesld4lorgld4l_servicesloc_name_batchjspquery=ezra20cornellampmaxRecords=10
One full setup per authority
Cache Server Query Process
JSP Query API
extract search rank
extract URI
Jena-Fuseki
Triplestore
for each result
LuceneSOLR
Index
lucene search for ezra cornell
index built with predicate values
ltskosprefLabelgt
ltskosaltLabelgt
httpservicesld4lorgld4l_servicesloc_name_batchjspquery=ezra20cornellampmaxRecords=10
One full setup per authority
Cache Server Query Process
JSP Query API
sparql query for URI
Jena-Fuseki
Triplestore
LuceneSOLR
Index
extract search rank
extract URI
for each result
lucene search for ezra cornell
index built with predicate values
ltskosprefLabelgt
ltskosaltLabelgt
httpservicesld4lorgld4l_servicesloc_name_batchjspquery=ezra20cornellampmaxRecords=10
One full setup per authority
Cache Server Query Process
JSP Query API
combine all results
Jena-Fuseki
Triplestore
insert search rank in predicate
lthttpvivoweborgontology
corerankgt
LuceneSOLR
Index
sparql query for URI extract search rank
extract URI
for each result
lucene search for ezra cornell
index built with predicate values
ltskosprefLabelgt
ltskosaltLabelgt
httpservicesld4lorgld4l_servicesloc_name_batchjspquery=ezra20cornellampmaxRecords=10
One full setup per authority
UI-QA-Authority
QA ndash normalize RDF returned from an authority
httplocalhost3000qasearchlinked_data
oclc_fastpersonal_nameq=twainampmaximumRecords=2
[urihttpidworldcatorgfast31622id31622
labelTwain Mark 1835-1910
urihttpidworldcatorgfast365563id365563
labelTwain Shania
httpexperimentalworldcatorgfastsearchquery=o
clcpersonalName+22twain22
ampsortKeys=usageampmaximumRecords=2
RDF of
search
results
Active-Triples
LDF Cache
(Marmotta or
Blazegraph) LDF Cache Jena-Fuseki-
Lucene
Cache
Direct Access
of External
Authority
HyraxVitrolib ndash UI for selecting an entry from an authority
search of cache performed via Lucene index
Third Set of Challenges
8 Disambiguation through better context eg expand from just prefLabel tohellip
preLabel altLabel birthdeath dates occupation etc
9 Reconciliation across multiple sources eg match LoC URI to OCLC FAST URI
53
Whatrsquos next
Addressing Architectural Challenges
bull Generalize process for accessing context on the
cache server and in the normalization layer
bull Multi-authority search and reconciliation
bull Address the need for cache refresh
bull Mirrored cache servers
User Experience and Design
bull User-centered Design
bull Observe listen learn design evaluate iterate
bull Iteratively design and evaluate UI for lookupauthorities
with catalogers
bull Search result rankingorderingfiltering for catalogers
bull Additional UI platforms eg FOLIO
56
Questions
httptinyurlcomld4l-auth-access
Appendix for Challenges 1-4
Challenge 1 Documentation
58
LoC httpidlocgovtechcenter
C Harlow notes on reconciling LoC - httpsgithubcomcmh2166lc-reconcile
OCLC FAST
httpswwwoclcorgdeveloperdevelopweb-servicesfast-apilinked-dataenhtml
GeoNames
httpwwwgeonamesorgexportgeonames-searchhtml
AGROVOC httpaimsfaoorgvest-registryvocabulariesagrovoc-multilingual-agricultural-thesaurus
swagger config httpsgithubcomNatLibFiSkosmosblobmasterswaggerjson
NALT
httpsagclassnalusdagov
DBpedia httpwikidbpediaorgOnlineAccess1220Public20Faceted20Web20Service20Inter
face
Challenge 2 Linked Data Access API
59
for Search Query for Term Fetch
LoC not supported URI
OCLC FAST httpexperimentalworldcatorgfastsearchq
uery=subauth+all+22query22ampsortK
eys=usageampmaximumRecords=maximumR
ecords
URI
GeoNames httpapigeonamesorgsearchq=queryamp
maxRows=maxRowsampusername=userna
meamptype=rdf
URI
AGROVOC httpartemideartuniroma2it8081agrovocr
estv1searchquery=queryamplang=lang
httpartemideartuniroma2it8081agrovo
crestv1datauri=httpaimsfaoorgaosa
grovocterm_id
NALT httpskosmoslibrarycornelledurestv1nalt
searchquery=queryamplang=lang
httpskosmoslibrarycornelledurestv1na
ltdatauri=term_uri
DBpedia
Challenge 3 Varying Results Formats
60
for Search Query for Term Fetch
LoC not supported rdf-xml
OCLC FAST rdf-xml rdf-xml
GeoNames rdf-xml rdf-xml
AGROVOC json-ld rdf-xml json-ld turtle
NALT json-ld rdf-xml json-ld turtle
DBpedia
Challenge 4 Varying Ontologies
61
Primary Ontology Flat vs Navigation
required
LoC madsrdf
SKOS
navigation required
OCLC FAST schemaorg
SKOS
flat
GeoNames geonames flat
hierarchical
AGROVOC SKOS flat
hierarchical
NALT SKOS flat
hierarchical
DBpedia dbpedia flat
Configurations for Questioning Authority
62
LoC httpsgithubcomld4l-
labslinked_data_authoritiestreemasterqa_locconfigauthoritieslinked_dat
a
OCLC FAST httpsgithubcomld4l-
labslinked_data_authoritiestreemasterqa_oclcfastconfigauthoritieslinked
_data
GeoNames httpsgithubcomld4l-
labslinked_data_authoritiestreemasterqa_geonamesconfigauthoritieslink
ed_data
AGROVOC httpsgithubcomld4l-
labslinked_data_authoritiestreemasterqa_agrovocconfigauthoritieslinked
_data
NALT httpsgithubcomld4l-
labslinked_data_authoritiestreemasterqa_naltconfigauthoritieslinked_dat
a
DBpedia httpsgithubcomld4l-
labslinked_data_authoritiestreemasterqa_dbpediaconfigauthoritieslinked
_data
Appendix for Challenges 5-7
Creating a Cache Server
Hardware
bull 8-core 64gb 3Ghz Mac Pro (late 2013) macOS Sierra
(10126)
bull 32tb Pegasus-2 Thunderbolt RAID configured as RAID-5
Triplestore
bull Apache Jena Fuseki 240 provides SPARQL endpoint
bull Apache Tomcat 90 runs custom web application(s)
bull Apache Lucene 36 provides search interface
64
Customizations
bull custom per-data-source JSP web application provides
searchbrowsedownload functionality
bull custom (generic) SPARQL Tag Library provides API for web
apps (available at httpsgithubcomeichmannlod-utilities)
bull custom (generic) Lucene Tag Library provides API for web apps
65
Loading a New Vocabulary
bull download RDF
bull if necessary convert to n-triples (required for GeoNames data for instance)
bull use tdbloader2 to populated triplestore
bull configure Fuseki server(s) with triplestore details
bull create new JSP project in Eclipse
bull write one or more indexer programs that populate Lucene indices and run indexer(s)
bull write searchbrowsedownload application logic using the SPARQL and Lucene tags
bull package project as war
bull deploy to Apache Tomcat server(s)
bull add new service to Apache HTTPD virtual host specification
66
UI Access to Cache Server
httpservicesld4lorgld4l_servicesloc_namejsp
Downloads
68
LoC httpidlocgovdownload (n-triples OR rdf-xml)
OCLC FAST httpwwwoclcorgresearchthemesdata-sciencefastdownloadhtml (n-triples)
GeoNames httpwwwgeonamesorgontologydocumentationhtml (custom format ndash see notes for processing)
AGROVOC httpsaims-faoatlassiannetwikispacesAGVpages2949126Releases (n-triples OR rdf-xml)
NALT httpsagclassnalusdagovdownloadshtml (rdf-xml)
DBpedia httpwikidbpediaorgdownloads-2016-04
Potential Options for Reconciliation
bull VIAF for name reconciliation ndash we are doing some
work with this
bull Wikidata ndash Ive heard that they are working on
Reconciliation issues but havent yet explored in
depth bull Intro Video (3hrs)
bull API Access
bull SPARQL ndash user manual
bull federated queries with other authorities
Doing a google search for linked data reconciliation
returns a large number of articles and presentations
on this concept
Links to Code amp More
bull QA Server - Code for a small app that provides the
Questioning Authority normalization layer
bull Linked Data Authorities - Configurations that can
be used with QA Server
bull LD4L Services - UI access to Cache Server
bull VitroLib - Code for the VitroLib cataloging tool
Multi-Server Architecture
QA ndash normalize RDF returned from an authority
httplocalhost3000qasearchlinked_data
oclc_fastpersonal_nameq=twainamp
maximumRecords=2
httpexperimentalworldcatorgfast
searchquery=oclcpersonalName+22twain22
ampsortKeys=usageampmaximumRecords=2
lthttpidworldcatorgfast31622gt
a schemaPerson
dctermsidentifier 31622
skosprefLabel Twain Mark 1835-1910
skosaltLabel Make Teviin 1835-1910
Make Tuwen 1835-1910
lthttpidworldcatorgfast365563gt
a schemaPerson
dctermsidentifier 365563
skosprefLabel Twain Shania
skosaltLabel Twain Eilleen
Edwards Eilleen
Direct Access
of External
Authority
HyraxVitrolib ndash UI for selecting an entry from an authority
Multi-Server Architecture
QA ndash normalize RDF returned from an authority
httplocalhost3000qasearchlinked_data
oclc_fastpersonal_nameq=twainamp
maximumRecords=2
[urihttpidworldcatorgfast31622
id31622 labelTwain Mark 1835-1910
urihttpidworldcatorgfast365563
id365563labelTwain Shania ]
httpexperimentalworldcatorgfast
searchquery=oclcpersonalName+22twain22
ampsortKeys=usageampmaximumRecords=2
lthttpidworldcatorgfast31622gt
a schemaPerson
dctermsidentifier 31622
skosprefLabel Twain Mark 1835-1910
skosaltLabel Make Teviin 1835-1910
Make Tuwen 1835-1910
lthttpidworldcatorgfast365563gt
a schemaPerson
dctermsidentifier 365563
skosprefLabel Twain Shania
skosaltLabel Twain Eilleen
Edwards Eilleen
Direct Access
of External
Authority
HyraxVitrolib ndash UI for selecting an entry from an authority
Direct Access Query API
Direct against authorityhellip
httpexperimentalworldcatorgfastsearch
query=oclcpersonalName+22twain22
ampmaximumRecords=2
httpapigeonamesorgsearch
q=ithaca
ampmaxRows=2
ampusername=demo
amptype=rdf
httpartemideartuniroma2it8081agrovocrestv1search
query=milk
amplang=en
ampmaxhits=2
Normalized Query API
Through QA normalization layerhellip
httplocalhost3000qasearchlinked_dataoclc_fast
q=twain
ampmaxRecords=2
httplocalhost3000qasearchlinked_datageonames
q=ithaca
ampmaxRecords=2
httplocalhost3000qasearchlinked_dataagrovoc
q=milk
ampmaxRecords=2
amplang=en
Normalized Results
[urihttpidworldcatorgfast31622 id31622 labelTwain Mark 1835-1910 urihttpidworldcatorgfast365563 id365563 labelTwain Shania]
[uri httpswsgeonamesorg2162552 id httpswsgeonamesorg2162552 label Ithaca (AU) uri httpswsgeonamesorg4515289 id httpswsgeonamesorg4515289 label Ithaca (US)]
[uri httpaimsfaoorgaosagrovocc_8602 id httpaimsfaoorgaosagrovocc_8602 label acidophilus milk uri httpaimsfaoorgaosagrovocc_16076 id httpaimsfaoorgaosagrovocc_16076 label buffalo milk]
OCLC FAST GeoNames AgroVoc
Second Set of Challenges
5 Reliability amp Efficiency eg server uptime server load
6 Accuracy eg select results based on usage data lexical match
custom weighting other
7 Order Ranking eg How to order a graph
Cache Server Query Process
JSP Query API
Jena-Fuseki
Triplestore
One full setup per authority
LuceneSOLR
Index
Cache Server Query Process
JSP Query API
httpservicesld4lorgld4l_servicesloc_name_batchjspquery=ezra20cornellampmaxRecords=10
Jena-Fuseki
Triplestore
LuceneSOLR
Index
One full setup per authority
Cache Server Query Process
JSP Query API
lucene search for ezra cornell
index built with predicate values
ltskosprefLabelgt
ltskosaltLabelgt
Jena-Fuseki
Triplestore
LuceneSOLR
Index
httpservicesld4lorgld4l_servicesloc_name_batchjspquery=ezra20cornellampmaxRecords=10
One full setup per authority
Cache Server Query Process
JSP Query API
extract search rank
extract URI
Jena-Fuseki
Triplestore
for each result
LuceneSOLR
Index
lucene search for ezra cornell
index built with predicate values
ltskosprefLabelgt
ltskosaltLabelgt
httpservicesld4lorgld4l_servicesloc_name_batchjspquery=ezra20cornellampmaxRecords=10
One full setup per authority
Cache Server Query Process
JSP Query API
sparql query for URI
Jena-Fuseki
Triplestore
LuceneSOLR
Index
extract search rank
extract URI
for each result
lucene search for ezra cornell
index built with predicate values
ltskosprefLabelgt
ltskosaltLabelgt
httpservicesld4lorgld4l_servicesloc_name_batchjspquery=ezra20cornellampmaxRecords=10
One full setup per authority
Cache Server Query Process
JSP Query API
combine all results
Jena-Fuseki
Triplestore
insert search rank in predicate
lthttpvivoweborgontology
corerankgt
LuceneSOLR
Index
sparql query for URI extract search rank
extract URI
for each result
lucene search for ezra cornell
index built with predicate values
ltskosprefLabelgt
ltskosaltLabelgt
httpservicesld4lorgld4l_servicesloc_name_batchjspquery=ezra20cornellampmaxRecords=10
One full setup per authority
UI-QA-Authority
QA ndash normalize RDF returned from an authority
httplocalhost3000qasearchlinked_data
oclc_fastpersonal_nameq=twainampmaximumRecords=2
[urihttpidworldcatorgfast31622id31622
labelTwain Mark 1835-1910
urihttpidworldcatorgfast365563id365563
labelTwain Shania
httpexperimentalworldcatorgfastsearchquery=o
clcpersonalName+22twain22
ampsortKeys=usageampmaximumRecords=2
RDF of
search
results
Active-Triples
LDF Cache
(Marmotta or
Blazegraph) LDF Cache Jena-Fuseki-
Lucene
Cache
Direct Access
of External
Authority
HyraxVitrolib ndash UI for selecting an entry from an authority
search of cache performed via Lucene index
Third Set of Challenges
8 Disambiguation through better context eg expand from just prefLabel tohellip
preLabel altLabel birthdeath dates occupation etc
9 Reconciliation across multiple sources eg match LoC URI to OCLC FAST URI
53
Whatrsquos next
Addressing Architectural Challenges
bull Generalize process for accessing context on the
cache server and in the normalization layer
bull Multi-authority search and reconciliation
bull Address the need for cache refresh
bull Mirrored cache servers
User Experience and Design
bull User-centered Design
bull Observe listen learn design evaluate iterate
bull Iteratively design and evaluate UI for lookupauthorities
with catalogers
bull Search result rankingorderingfiltering for catalogers
bull Additional UI platforms eg FOLIO
56
Questions
httptinyurlcomld4l-auth-access
Appendix for Challenges 1-4
Challenge 1 Documentation
58
LoC httpidlocgovtechcenter
C Harlow notes on reconciling LoC - httpsgithubcomcmh2166lc-reconcile
OCLC FAST
httpswwwoclcorgdeveloperdevelopweb-servicesfast-apilinked-dataenhtml
GeoNames
httpwwwgeonamesorgexportgeonames-searchhtml
AGROVOC httpaimsfaoorgvest-registryvocabulariesagrovoc-multilingual-agricultural-thesaurus
swagger config httpsgithubcomNatLibFiSkosmosblobmasterswaggerjson
NALT
httpsagclassnalusdagov
DBpedia httpwikidbpediaorgOnlineAccess1220Public20Faceted20Web20Service20Inter
face
Challenge 2 Linked Data Access API
59
for Search Query for Term Fetch
LoC not supported URI
OCLC FAST httpexperimentalworldcatorgfastsearchq
uery=subauth+all+22query22ampsortK
eys=usageampmaximumRecords=maximumR
ecords
URI
GeoNames httpapigeonamesorgsearchq=queryamp
maxRows=maxRowsampusername=userna
meamptype=rdf
URI
AGROVOC httpartemideartuniroma2it8081agrovocr
estv1searchquery=queryamplang=lang
httpartemideartuniroma2it8081agrovo
crestv1datauri=httpaimsfaoorgaosa
grovocterm_id
NALT httpskosmoslibrarycornelledurestv1nalt
searchquery=queryamplang=lang
httpskosmoslibrarycornelledurestv1na
ltdatauri=term_uri
DBpedia
Challenge 3 Varying Results Formats
60
for Search Query for Term Fetch
LoC not supported rdf-xml
OCLC FAST rdf-xml rdf-xml
GeoNames rdf-xml rdf-xml
AGROVOC json-ld rdf-xml json-ld turtle
NALT json-ld rdf-xml json-ld turtle
DBpedia
Challenge 4 Varying Ontologies
61
Primary Ontology Flat vs Navigation
required
LoC madsrdf
SKOS
navigation required
OCLC FAST schemaorg
SKOS
flat
GeoNames geonames flat
hierarchical
AGROVOC SKOS flat
hierarchical
NALT SKOS flat
hierarchical
DBpedia dbpedia flat
Configurations for Questioning Authority
62
LoC httpsgithubcomld4l-
labslinked_data_authoritiestreemasterqa_locconfigauthoritieslinked_dat
a
OCLC FAST httpsgithubcomld4l-
labslinked_data_authoritiestreemasterqa_oclcfastconfigauthoritieslinked
_data
GeoNames httpsgithubcomld4l-
labslinked_data_authoritiestreemasterqa_geonamesconfigauthoritieslink
ed_data
AGROVOC httpsgithubcomld4l-
labslinked_data_authoritiestreemasterqa_agrovocconfigauthoritieslinked
_data
NALT httpsgithubcomld4l-
labslinked_data_authoritiestreemasterqa_naltconfigauthoritieslinked_dat
a
DBpedia httpsgithubcomld4l-
labslinked_data_authoritiestreemasterqa_dbpediaconfigauthoritieslinked
_data
Appendix for Challenges 5-7
Creating a Cache Server
Hardware
bull 8-core 64gb 3Ghz Mac Pro (late 2013) macOS Sierra
(10126)
bull 32tb Pegasus-2 Thunderbolt RAID configured as RAID-5
Triplestore
bull Apache Jena Fuseki 240 provides SPARQL endpoint
bull Apache Tomcat 90 runs custom web application(s)
bull Apache Lucene 36 provides search interface
64
Customizations
bull custom per-data-source JSP web application provides
searchbrowsedownload functionality
bull custom (generic) SPARQL Tag Library provides API for web
apps (available at httpsgithubcomeichmannlod-utilities)
bull custom (generic) Lucene Tag Library provides API for web apps
65
Loading a New Vocabulary
bull download RDF
bull if necessary convert to n-triples (required for GeoNames data for instance)
bull use tdbloader2 to populated triplestore
bull configure Fuseki server(s) with triplestore details
bull create new JSP project in Eclipse
bull write one or more indexer programs that populate Lucene indices and run indexer(s)
bull write searchbrowsedownload application logic using the SPARQL and Lucene tags
bull package project as war
bull deploy to Apache Tomcat server(s)
bull add new service to Apache HTTPD virtual host specification
66
UI Access to Cache Server
httpservicesld4lorgld4l_servicesloc_namejsp
Downloads
68
LoC httpidlocgovdownload (n-triples OR rdf-xml)
OCLC FAST httpwwwoclcorgresearchthemesdata-sciencefastdownloadhtml (n-triples)
GeoNames httpwwwgeonamesorgontologydocumentationhtml (custom format ndash see notes for processing)
AGROVOC httpsaims-faoatlassiannetwikispacesAGVpages2949126Releases (n-triples OR rdf-xml)
NALT httpsagclassnalusdagovdownloadshtml (rdf-xml)
DBpedia httpwikidbpediaorgdownloads-2016-04
Potential Options for Reconciliation
bull VIAF for name reconciliation ndash we are doing some
work with this
bull Wikidata ndash Ive heard that they are working on
Reconciliation issues but havent yet explored in
depth bull Intro Video (3hrs)
bull API Access
bull SPARQL ndash user manual
bull federated queries with other authorities
Doing a google search for linked data reconciliation
returns a large number of articles and presentations
on this concept
Links to Code amp More
bull QA Server - Code for a small app that provides the
Questioning Authority normalization layer
bull Linked Data Authorities - Configurations that can
be used with QA Server
bull LD4L Services - UI access to Cache Server
bull VitroLib - Code for the VitroLib cataloging tool
Multi-Server Architecture
QA ndash normalize RDF returned from an authority
httplocalhost3000qasearchlinked_data
oclc_fastpersonal_nameq=twainamp
maximumRecords=2
[urihttpidworldcatorgfast31622
id31622 labelTwain Mark 1835-1910
urihttpidworldcatorgfast365563
id365563labelTwain Shania ]
httpexperimentalworldcatorgfast
searchquery=oclcpersonalName+22twain22
ampsortKeys=usageampmaximumRecords=2
lthttpidworldcatorgfast31622gt
a schemaPerson
dctermsidentifier 31622
skosprefLabel Twain Mark 1835-1910
skosaltLabel Make Teviin 1835-1910
Make Tuwen 1835-1910
lthttpidworldcatorgfast365563gt
a schemaPerson
dctermsidentifier 365563
skosprefLabel Twain Shania
skosaltLabel Twain Eilleen
Edwards Eilleen
Direct Access
of External
Authority
HyraxVitrolib ndash UI for selecting an entry from an authority
Direct Access Query API
Direct against authorityhellip
httpexperimentalworldcatorgfastsearch
query=oclcpersonalName+22twain22
ampmaximumRecords=2
httpapigeonamesorgsearch
q=ithaca
ampmaxRows=2
ampusername=demo
amptype=rdf
httpartemideartuniroma2it8081agrovocrestv1search
query=milk
amplang=en
ampmaxhits=2
Normalized Query API
Through QA normalization layerhellip
httplocalhost3000qasearchlinked_dataoclc_fast
q=twain
ampmaxRecords=2
httplocalhost3000qasearchlinked_datageonames
q=ithaca
ampmaxRecords=2
httplocalhost3000qasearchlinked_dataagrovoc
q=milk
ampmaxRecords=2
amplang=en
Normalized Results
[urihttpidworldcatorgfast31622 id31622 labelTwain Mark 1835-1910 urihttpidworldcatorgfast365563 id365563 labelTwain Shania]
[uri httpswsgeonamesorg2162552 id httpswsgeonamesorg2162552 label Ithaca (AU) uri httpswsgeonamesorg4515289 id httpswsgeonamesorg4515289 label Ithaca (US)]
[uri httpaimsfaoorgaosagrovocc_8602 id httpaimsfaoorgaosagrovocc_8602 label acidophilus milk uri httpaimsfaoorgaosagrovocc_16076 id httpaimsfaoorgaosagrovocc_16076 label buffalo milk]
OCLC FAST GeoNames AgroVoc
Second Set of Challenges
5 Reliability amp Efficiency eg server uptime server load
6 Accuracy eg select results based on usage data lexical match
custom weighting other
7 Order Ranking eg How to order a graph
Cache Server Query Process
JSP Query API
Jena-Fuseki
Triplestore
One full setup per authority
LuceneSOLR
Index
Cache Server Query Process
JSP Query API
httpservicesld4lorgld4l_servicesloc_name_batchjspquery=ezra20cornellampmaxRecords=10
Jena-Fuseki
Triplestore
LuceneSOLR
Index
One full setup per authority
Cache Server Query Process
JSP Query API
lucene search for ezra cornell
index built with predicate values
ltskosprefLabelgt
ltskosaltLabelgt
Jena-Fuseki
Triplestore
LuceneSOLR
Index
httpservicesld4lorgld4l_servicesloc_name_batchjspquery=ezra20cornellampmaxRecords=10
One full setup per authority
Cache Server Query Process
JSP Query API
extract search rank
extract URI
Jena-Fuseki
Triplestore
for each result
LuceneSOLR
Index
lucene search for ezra cornell
index built with predicate values
ltskosprefLabelgt
ltskosaltLabelgt
httpservicesld4lorgld4l_servicesloc_name_batchjspquery=ezra20cornellampmaxRecords=10
One full setup per authority
Cache Server Query Process
JSP Query API
sparql query for URI
Jena-Fuseki
Triplestore
LuceneSOLR
Index
extract search rank
extract URI
for each result
lucene search for ezra cornell
index built with predicate values
ltskosprefLabelgt
ltskosaltLabelgt
httpservicesld4lorgld4l_servicesloc_name_batchjspquery=ezra20cornellampmaxRecords=10
One full setup per authority
Cache Server Query Process
JSP Query API
combine all results
Jena-Fuseki
Triplestore
insert search rank in predicate
lthttpvivoweborgontology
corerankgt
LuceneSOLR
Index
sparql query for URI extract search rank
extract URI
for each result
lucene search for ezra cornell
index built with predicate values
ltskosprefLabelgt
ltskosaltLabelgt
httpservicesld4lorgld4l_servicesloc_name_batchjspquery=ezra20cornellampmaxRecords=10
One full setup per authority
UI-QA-Authority
QA ndash normalize RDF returned from an authority
httplocalhost3000qasearchlinked_data
oclc_fastpersonal_nameq=twainampmaximumRecords=2
[urihttpidworldcatorgfast31622id31622
labelTwain Mark 1835-1910
urihttpidworldcatorgfast365563id365563
labelTwain Shania
httpexperimentalworldcatorgfastsearchquery=o
clcpersonalName+22twain22
ampsortKeys=usageampmaximumRecords=2
RDF of
search
results
Active-Triples
LDF Cache
(Marmotta or
Blazegraph) LDF Cache Jena-Fuseki-
Lucene
Cache
Direct Access
of External
Authority
HyraxVitrolib ndash UI for selecting an entry from an authority
search of cache performed via Lucene index
Third Set of Challenges
8 Disambiguation through better context eg expand from just prefLabel tohellip
preLabel altLabel birthdeath dates occupation etc
9 Reconciliation across multiple sources eg match LoC URI to OCLC FAST URI
53
Whatrsquos next
Addressing Architectural Challenges
bull Generalize process for accessing context on the
cache server and in the normalization layer
bull Multi-authority search and reconciliation
bull Address the need for cache refresh
bull Mirrored cache servers
User Experience and Design
bull User-centered Design
bull Observe listen learn design evaluate iterate
bull Iteratively design and evaluate UI for lookupauthorities
with catalogers
bull Search result rankingorderingfiltering for catalogers
bull Additional UI platforms eg FOLIO
56
Questions
httptinyurlcomld4l-auth-access
Appendix for Challenges 1-4
Challenge 1 Documentation
58
LoC httpidlocgovtechcenter
C Harlow notes on reconciling LoC - httpsgithubcomcmh2166lc-reconcile
OCLC FAST
httpswwwoclcorgdeveloperdevelopweb-servicesfast-apilinked-dataenhtml
GeoNames
httpwwwgeonamesorgexportgeonames-searchhtml
AGROVOC httpaimsfaoorgvest-registryvocabulariesagrovoc-multilingual-agricultural-thesaurus
swagger config httpsgithubcomNatLibFiSkosmosblobmasterswaggerjson
NALT
httpsagclassnalusdagov
DBpedia httpwikidbpediaorgOnlineAccess1220Public20Faceted20Web20Service20Inter
face
Challenge 2 Linked Data Access API
59
for Search Query for Term Fetch
LoC not supported URI
OCLC FAST httpexperimentalworldcatorgfastsearchq
uery=subauth+all+22query22ampsortK
eys=usageampmaximumRecords=maximumR
ecords
URI
GeoNames httpapigeonamesorgsearchq=queryamp
maxRows=maxRowsampusername=userna
meamptype=rdf
URI
AGROVOC httpartemideartuniroma2it8081agrovocr
estv1searchquery=queryamplang=lang
httpartemideartuniroma2it8081agrovo
crestv1datauri=httpaimsfaoorgaosa
grovocterm_id
NALT httpskosmoslibrarycornelledurestv1nalt
searchquery=queryamplang=lang
httpskosmoslibrarycornelledurestv1na
ltdatauri=term_uri
DBpedia
Challenge 3 Varying Results Formats
60
for Search Query for Term Fetch
LoC not supported rdf-xml
OCLC FAST rdf-xml rdf-xml
GeoNames rdf-xml rdf-xml
AGROVOC json-ld rdf-xml json-ld turtle
NALT json-ld rdf-xml json-ld turtle
DBpedia
Challenge 4 Varying Ontologies
61
Primary Ontology Flat vs Navigation
required
LoC madsrdf
SKOS
navigation required
OCLC FAST schemaorg
SKOS
flat
GeoNames geonames flat
hierarchical
AGROVOC SKOS flat
hierarchical
NALT SKOS flat
hierarchical
DBpedia dbpedia flat
Configurations for Questioning Authority
62
LoC httpsgithubcomld4l-
labslinked_data_authoritiestreemasterqa_locconfigauthoritieslinked_dat
a
OCLC FAST httpsgithubcomld4l-
labslinked_data_authoritiestreemasterqa_oclcfastconfigauthoritieslinked
_data
GeoNames httpsgithubcomld4l-
labslinked_data_authoritiestreemasterqa_geonamesconfigauthoritieslink
ed_data
AGROVOC httpsgithubcomld4l-
labslinked_data_authoritiestreemasterqa_agrovocconfigauthoritieslinked
_data
NALT httpsgithubcomld4l-
labslinked_data_authoritiestreemasterqa_naltconfigauthoritieslinked_dat
a
DBpedia httpsgithubcomld4l-
labslinked_data_authoritiestreemasterqa_dbpediaconfigauthoritieslinked
_data
Appendix for Challenges 5-7
Creating a Cache Server
Hardware
bull 8-core 64gb 3Ghz Mac Pro (late 2013) macOS Sierra
(10126)
bull 32tb Pegasus-2 Thunderbolt RAID configured as RAID-5
Triplestore
bull Apache Jena Fuseki 240 provides SPARQL endpoint
bull Apache Tomcat 90 runs custom web application(s)
bull Apache Lucene 36 provides search interface
64
Customizations
bull custom per-data-source JSP web application provides
searchbrowsedownload functionality
bull custom (generic) SPARQL Tag Library provides API for web
apps (available at httpsgithubcomeichmannlod-utilities)
bull custom (generic) Lucene Tag Library provides API for web apps
65
Loading a New Vocabulary
bull download RDF
bull if necessary convert to n-triples (required for GeoNames data for instance)
bull use tdbloader2 to populated triplestore
bull configure Fuseki server(s) with triplestore details
bull create new JSP project in Eclipse
bull write one or more indexer programs that populate Lucene indices and run indexer(s)
bull write searchbrowsedownload application logic using the SPARQL and Lucene tags
bull package project as war
bull deploy to Apache Tomcat server(s)
bull add new service to Apache HTTPD virtual host specification
66
UI Access to Cache Server
httpservicesld4lorgld4l_servicesloc_namejsp
Downloads
68
LoC httpidlocgovdownload (n-triples OR rdf-xml)
OCLC FAST httpwwwoclcorgresearchthemesdata-sciencefastdownloadhtml (n-triples)
GeoNames httpwwwgeonamesorgontologydocumentationhtml (custom format ndash see notes for processing)
AGROVOC httpsaims-faoatlassiannetwikispacesAGVpages2949126Releases (n-triples OR rdf-xml)
NALT httpsagclassnalusdagovdownloadshtml (rdf-xml)
DBpedia httpwikidbpediaorgdownloads-2016-04
Potential Options for Reconciliation
bull VIAF for name reconciliation ndash we are doing some
work with this
bull Wikidata ndash Ive heard that they are working on
Reconciliation issues but havent yet explored in
depth bull Intro Video (3hrs)
bull API Access
bull SPARQL ndash user manual
bull federated queries with other authorities
Doing a google search for linked data reconciliation
returns a large number of articles and presentations
on this concept
Links to Code amp More
bull QA Server - Code for a small app that provides the
Questioning Authority normalization layer
bull Linked Data Authorities - Configurations that can
be used with QA Server
bull LD4L Services - UI access to Cache Server
bull VitroLib - Code for the VitroLib cataloging tool
Direct Access Query API
Direct against authorityhellip
httpexperimentalworldcatorgfastsearch
query=oclcpersonalName+22twain22
ampmaximumRecords=2
httpapigeonamesorgsearch
q=ithaca
ampmaxRows=2
ampusername=demo
amptype=rdf
httpartemideartuniroma2it8081agrovocrestv1search
query=milk
amplang=en
ampmaxhits=2
Normalized Query API
Through QA normalization layerhellip
httplocalhost3000qasearchlinked_dataoclc_fast
q=twain
ampmaxRecords=2
httplocalhost3000qasearchlinked_datageonames
q=ithaca
ampmaxRecords=2
httplocalhost3000qasearchlinked_dataagrovoc
q=milk
ampmaxRecords=2
amplang=en
Normalized Results
[urihttpidworldcatorgfast31622 id31622 labelTwain Mark 1835-1910 urihttpidworldcatorgfast365563 id365563 labelTwain Shania]
[uri httpswsgeonamesorg2162552 id httpswsgeonamesorg2162552 label Ithaca (AU) uri httpswsgeonamesorg4515289 id httpswsgeonamesorg4515289 label Ithaca (US)]
[uri httpaimsfaoorgaosagrovocc_8602 id httpaimsfaoorgaosagrovocc_8602 label acidophilus milk uri httpaimsfaoorgaosagrovocc_16076 id httpaimsfaoorgaosagrovocc_16076 label buffalo milk]
OCLC FAST GeoNames AgroVoc
Second Set of Challenges
5 Reliability amp Efficiency eg server uptime server load
6 Accuracy eg select results based on usage data lexical match
custom weighting other
7 Order Ranking eg How to order a graph
Cache Server Query Process
JSP Query API
Jena-Fuseki
Triplestore
One full setup per authority
LuceneSOLR
Index
Cache Server Query Process
JSP Query API
httpservicesld4lorgld4l_servicesloc_name_batchjspquery=ezra20cornellampmaxRecords=10
Jena-Fuseki
Triplestore
LuceneSOLR
Index
One full setup per authority
Cache Server Query Process
JSP Query API
lucene search for ezra cornell
index built with predicate values
ltskosprefLabelgt
ltskosaltLabelgt
Jena-Fuseki
Triplestore
LuceneSOLR
Index
httpservicesld4lorgld4l_servicesloc_name_batchjspquery=ezra20cornellampmaxRecords=10
One full setup per authority
Cache Server Query Process
JSP Query API
extract search rank
extract URI
Jena-Fuseki
Triplestore
for each result
LuceneSOLR
Index
lucene search for ezra cornell
index built with predicate values
ltskosprefLabelgt
ltskosaltLabelgt
httpservicesld4lorgld4l_servicesloc_name_batchjspquery=ezra20cornellampmaxRecords=10
One full setup per authority
Cache Server Query Process
JSP Query API
sparql query for URI
Jena-Fuseki
Triplestore
LuceneSOLR
Index
extract search rank
extract URI
for each result
lucene search for ezra cornell
index built with predicate values
ltskosprefLabelgt
ltskosaltLabelgt
httpservicesld4lorgld4l_servicesloc_name_batchjspquery=ezra20cornellampmaxRecords=10
One full setup per authority
Cache Server Query Process
JSP Query API
combine all results
Jena-Fuseki
Triplestore
insert search rank in predicate
lthttpvivoweborgontology
corerankgt
LuceneSOLR
Index
sparql query for URI extract search rank
extract URI
for each result
lucene search for ezra cornell
index built with predicate values
ltskosprefLabelgt
ltskosaltLabelgt
httpservicesld4lorgld4l_servicesloc_name_batchjspquery=ezra20cornellampmaxRecords=10
One full setup per authority
UI-QA-Authority
QA ndash normalize RDF returned from an authority
httplocalhost3000qasearchlinked_data
oclc_fastpersonal_nameq=twainampmaximumRecords=2
[urihttpidworldcatorgfast31622id31622
labelTwain Mark 1835-1910
urihttpidworldcatorgfast365563id365563
labelTwain Shania
httpexperimentalworldcatorgfastsearchquery=o
clcpersonalName+22twain22
ampsortKeys=usageampmaximumRecords=2
RDF of
search
results
Active-Triples
LDF Cache
(Marmotta or
Blazegraph) LDF Cache Jena-Fuseki-
Lucene
Cache
Direct Access
of External
Authority
HyraxVitrolib ndash UI for selecting an entry from an authority
search of cache performed via Lucene index
Third Set of Challenges
8 Disambiguation through better context eg expand from just prefLabel tohellip
preLabel altLabel birthdeath dates occupation etc
9 Reconciliation across multiple sources eg match LoC URI to OCLC FAST URI
53
Whatrsquos next
Addressing Architectural Challenges
bull Generalize process for accessing context on the
cache server and in the normalization layer
bull Multi-authority search and reconciliation
bull Address the need for cache refresh
bull Mirrored cache servers
User Experience and Design
bull User-centered Design
bull Observe listen learn design evaluate iterate
bull Iteratively design and evaluate UI for lookupauthorities
with catalogers
bull Search result rankingorderingfiltering for catalogers
bull Additional UI platforms eg FOLIO
56
Questions
httptinyurlcomld4l-auth-access
Appendix for Challenges 1-4
Challenge 1 Documentation
58
LoC httpidlocgovtechcenter
C Harlow notes on reconciling LoC - httpsgithubcomcmh2166lc-reconcile
OCLC FAST
httpswwwoclcorgdeveloperdevelopweb-servicesfast-apilinked-dataenhtml
GeoNames
httpwwwgeonamesorgexportgeonames-searchhtml
AGROVOC httpaimsfaoorgvest-registryvocabulariesagrovoc-multilingual-agricultural-thesaurus
swagger config httpsgithubcomNatLibFiSkosmosblobmasterswaggerjson
NALT
httpsagclassnalusdagov
DBpedia httpwikidbpediaorgOnlineAccess1220Public20Faceted20Web20Service20Inter
face
Challenge 2 Linked Data Access API
59
for Search Query for Term Fetch
LoC not supported URI
OCLC FAST httpexperimentalworldcatorgfastsearchq
uery=subauth+all+22query22ampsortK
eys=usageampmaximumRecords=maximumR
ecords
URI
GeoNames httpapigeonamesorgsearchq=queryamp
maxRows=maxRowsampusername=userna
meamptype=rdf
URI
AGROVOC httpartemideartuniroma2it8081agrovocr
estv1searchquery=queryamplang=lang
httpartemideartuniroma2it8081agrovo
crestv1datauri=httpaimsfaoorgaosa
grovocterm_id
NALT httpskosmoslibrarycornelledurestv1nalt
searchquery=queryamplang=lang
httpskosmoslibrarycornelledurestv1na
ltdatauri=term_uri
DBpedia
Challenge 3 Varying Results Formats
60
for Search Query for Term Fetch
LoC not supported rdf-xml
OCLC FAST rdf-xml rdf-xml
GeoNames rdf-xml rdf-xml
AGROVOC json-ld rdf-xml json-ld turtle
NALT json-ld rdf-xml json-ld turtle
DBpedia
Challenge 4 Varying Ontologies
61
Primary Ontology Flat vs Navigation
required
LoC madsrdf
SKOS
navigation required
OCLC FAST schemaorg
SKOS
flat
GeoNames geonames flat
hierarchical
AGROVOC SKOS flat
hierarchical
NALT SKOS flat
hierarchical
DBpedia dbpedia flat
Configurations for Questioning Authority
62
LoC httpsgithubcomld4l-
labslinked_data_authoritiestreemasterqa_locconfigauthoritieslinked_dat
a
OCLC FAST httpsgithubcomld4l-
labslinked_data_authoritiestreemasterqa_oclcfastconfigauthoritieslinked
_data
GeoNames httpsgithubcomld4l-
labslinked_data_authoritiestreemasterqa_geonamesconfigauthoritieslink
ed_data
AGROVOC httpsgithubcomld4l-
labslinked_data_authoritiestreemasterqa_agrovocconfigauthoritieslinked
_data
NALT httpsgithubcomld4l-
labslinked_data_authoritiestreemasterqa_naltconfigauthoritieslinked_dat
a
DBpedia httpsgithubcomld4l-
labslinked_data_authoritiestreemasterqa_dbpediaconfigauthoritieslinked
_data
Appendix for Challenges 5-7
Creating a Cache Server
Hardware
bull 8-core 64gb 3Ghz Mac Pro (late 2013) macOS Sierra
(10126)
bull 32tb Pegasus-2 Thunderbolt RAID configured as RAID-5
Triplestore
bull Apache Jena Fuseki 240 provides SPARQL endpoint
bull Apache Tomcat 90 runs custom web application(s)
bull Apache Lucene 36 provides search interface
64
Customizations
bull custom per-data-source JSP web application provides
searchbrowsedownload functionality
bull custom (generic) SPARQL Tag Library provides API for web
apps (available at httpsgithubcomeichmannlod-utilities)
bull custom (generic) Lucene Tag Library provides API for web apps
65
Loading a New Vocabulary
bull download RDF
bull if necessary convert to n-triples (required for GeoNames data for instance)
bull use tdbloader2 to populated triplestore
bull configure Fuseki server(s) with triplestore details
bull create new JSP project in Eclipse
bull write one or more indexer programs that populate Lucene indices and run indexer(s)
bull write searchbrowsedownload application logic using the SPARQL and Lucene tags
bull package project as war
bull deploy to Apache Tomcat server(s)
bull add new service to Apache HTTPD virtual host specification
66
UI Access to Cache Server
httpservicesld4lorgld4l_servicesloc_namejsp
Downloads
68
LoC httpidlocgovdownload (n-triples OR rdf-xml)
OCLC FAST httpwwwoclcorgresearchthemesdata-sciencefastdownloadhtml (n-triples)
GeoNames httpwwwgeonamesorgontologydocumentationhtml (custom format ndash see notes for processing)
AGROVOC httpsaims-faoatlassiannetwikispacesAGVpages2949126Releases (n-triples OR rdf-xml)
NALT httpsagclassnalusdagovdownloadshtml (rdf-xml)
DBpedia httpwikidbpediaorgdownloads-2016-04
Potential Options for Reconciliation
bull VIAF for name reconciliation ndash we are doing some
work with this
bull Wikidata ndash Ive heard that they are working on
Reconciliation issues but havent yet explored in
depth bull Intro Video (3hrs)
bull API Access
bull SPARQL ndash user manual
bull federated queries with other authorities
Doing a google search for linked data reconciliation
returns a large number of articles and presentations
on this concept
Links to Code amp More
bull QA Server - Code for a small app that provides the
Questioning Authority normalization layer
bull Linked Data Authorities - Configurations that can
be used with QA Server
bull LD4L Services - UI access to Cache Server
bull VitroLib - Code for the VitroLib cataloging tool
Normalized Query API
Through QA normalization layerhellip
httplocalhost3000qasearchlinked_dataoclc_fast
q=twain
ampmaxRecords=2
httplocalhost3000qasearchlinked_datageonames
q=ithaca
ampmaxRecords=2
httplocalhost3000qasearchlinked_dataagrovoc
q=milk
ampmaxRecords=2
amplang=en
Normalized Results
[urihttpidworldcatorgfast31622 id31622 labelTwain Mark 1835-1910 urihttpidworldcatorgfast365563 id365563 labelTwain Shania]
[uri httpswsgeonamesorg2162552 id httpswsgeonamesorg2162552 label Ithaca (AU) uri httpswsgeonamesorg4515289 id httpswsgeonamesorg4515289 label Ithaca (US)]
[uri httpaimsfaoorgaosagrovocc_8602 id httpaimsfaoorgaosagrovocc_8602 label acidophilus milk uri httpaimsfaoorgaosagrovocc_16076 id httpaimsfaoorgaosagrovocc_16076 label buffalo milk]
OCLC FAST GeoNames AgroVoc
Second Set of Challenges
5 Reliability amp Efficiency eg server uptime server load
6 Accuracy eg select results based on usage data lexical match
custom weighting other
7 Order Ranking eg How to order a graph
Cache Server Query Process
JSP Query API
Jena-Fuseki
Triplestore
One full setup per authority
LuceneSOLR
Index
Cache Server Query Process
JSP Query API
httpservicesld4lorgld4l_servicesloc_name_batchjspquery=ezra20cornellampmaxRecords=10
Jena-Fuseki
Triplestore
LuceneSOLR
Index
One full setup per authority
Cache Server Query Process
JSP Query API
lucene search for ezra cornell
index built with predicate values
ltskosprefLabelgt
ltskosaltLabelgt
Jena-Fuseki
Triplestore
LuceneSOLR
Index
httpservicesld4lorgld4l_servicesloc_name_batchjspquery=ezra20cornellampmaxRecords=10
One full setup per authority
Cache Server Query Process
JSP Query API
extract search rank
extract URI
Jena-Fuseki
Triplestore
for each result
LuceneSOLR
Index
lucene search for ezra cornell
index built with predicate values
ltskosprefLabelgt
ltskosaltLabelgt
httpservicesld4lorgld4l_servicesloc_name_batchjspquery=ezra20cornellampmaxRecords=10
One full setup per authority
Cache Server Query Process
JSP Query API
sparql query for URI
Jena-Fuseki
Triplestore
LuceneSOLR
Index
extract search rank
extract URI
for each result
lucene search for ezra cornell
index built with predicate values
ltskosprefLabelgt
ltskosaltLabelgt
httpservicesld4lorgld4l_servicesloc_name_batchjspquery=ezra20cornellampmaxRecords=10
One full setup per authority
Cache Server Query Process
JSP Query API
combine all results
Jena-Fuseki
Triplestore
insert search rank in predicate
lthttpvivoweborgontology
corerankgt
LuceneSOLR
Index
sparql query for URI extract search rank
extract URI
for each result
lucene search for ezra cornell
index built with predicate values
ltskosprefLabelgt
ltskosaltLabelgt
httpservicesld4lorgld4l_servicesloc_name_batchjspquery=ezra20cornellampmaxRecords=10
One full setup per authority
UI-QA-Authority
QA ndash normalize RDF returned from an authority
httplocalhost3000qasearchlinked_data
oclc_fastpersonal_nameq=twainampmaximumRecords=2
[urihttpidworldcatorgfast31622id31622
labelTwain Mark 1835-1910
urihttpidworldcatorgfast365563id365563
labelTwain Shania
httpexperimentalworldcatorgfastsearchquery=o
clcpersonalName+22twain22
ampsortKeys=usageampmaximumRecords=2
RDF of
search
results
Active-Triples
LDF Cache
(Marmotta or
Blazegraph) LDF Cache Jena-Fuseki-
Lucene
Cache
Direct Access
of External
Authority
HyraxVitrolib ndash UI for selecting an entry from an authority
search of cache performed via Lucene index
Third Set of Challenges
8 Disambiguation through better context eg expand from just prefLabel tohellip
preLabel altLabel birthdeath dates occupation etc
9 Reconciliation across multiple sources eg match LoC URI to OCLC FAST URI
53
Whatrsquos next
Addressing Architectural Challenges
bull Generalize process for accessing context on the
cache server and in the normalization layer
bull Multi-authority search and reconciliation
bull Address the need for cache refresh
bull Mirrored cache servers
User Experience and Design
bull User-centered Design
bull Observe listen learn design evaluate iterate
bull Iteratively design and evaluate UI for lookupauthorities
with catalogers
bull Search result rankingorderingfiltering for catalogers
bull Additional UI platforms eg FOLIO
56
Questions
httptinyurlcomld4l-auth-access
Appendix for Challenges 1-4
Challenge 1 Documentation
58
LoC httpidlocgovtechcenter
C Harlow notes on reconciling LoC - httpsgithubcomcmh2166lc-reconcile
OCLC FAST
httpswwwoclcorgdeveloperdevelopweb-servicesfast-apilinked-dataenhtml
GeoNames
httpwwwgeonamesorgexportgeonames-searchhtml
AGROVOC httpaimsfaoorgvest-registryvocabulariesagrovoc-multilingual-agricultural-thesaurus
swagger config httpsgithubcomNatLibFiSkosmosblobmasterswaggerjson
NALT
httpsagclassnalusdagov
DBpedia httpwikidbpediaorgOnlineAccess1220Public20Faceted20Web20Service20Inter
face
Challenge 2 Linked Data Access API
59
for Search Query for Term Fetch
LoC not supported URI
OCLC FAST httpexperimentalworldcatorgfastsearchq
uery=subauth+all+22query22ampsortK
eys=usageampmaximumRecords=maximumR
ecords
URI
GeoNames httpapigeonamesorgsearchq=queryamp
maxRows=maxRowsampusername=userna
meamptype=rdf
URI
AGROVOC httpartemideartuniroma2it8081agrovocr
estv1searchquery=queryamplang=lang
httpartemideartuniroma2it8081agrovo
crestv1datauri=httpaimsfaoorgaosa
grovocterm_id
NALT httpskosmoslibrarycornelledurestv1nalt
searchquery=queryamplang=lang
httpskosmoslibrarycornelledurestv1na
ltdatauri=term_uri
DBpedia
Challenge 3 Varying Results Formats
60
for Search Query for Term Fetch
LoC not supported rdf-xml
OCLC FAST rdf-xml rdf-xml
GeoNames rdf-xml rdf-xml
AGROVOC json-ld rdf-xml json-ld turtle
NALT json-ld rdf-xml json-ld turtle
DBpedia
Challenge 4 Varying Ontologies
61
Primary Ontology Flat vs Navigation
required
LoC madsrdf
SKOS
navigation required
OCLC FAST schemaorg
SKOS
flat
GeoNames geonames flat
hierarchical
AGROVOC SKOS flat
hierarchical
NALT SKOS flat
hierarchical
DBpedia dbpedia flat
Configurations for Questioning Authority
62
LoC httpsgithubcomld4l-
labslinked_data_authoritiestreemasterqa_locconfigauthoritieslinked_dat
a
OCLC FAST httpsgithubcomld4l-
labslinked_data_authoritiestreemasterqa_oclcfastconfigauthoritieslinked
_data
GeoNames httpsgithubcomld4l-
labslinked_data_authoritiestreemasterqa_geonamesconfigauthoritieslink
ed_data
AGROVOC httpsgithubcomld4l-
labslinked_data_authoritiestreemasterqa_agrovocconfigauthoritieslinked
_data
NALT httpsgithubcomld4l-
labslinked_data_authoritiestreemasterqa_naltconfigauthoritieslinked_dat
a
DBpedia httpsgithubcomld4l-
labslinked_data_authoritiestreemasterqa_dbpediaconfigauthoritieslinked
_data
Appendix for Challenges 5-7
Creating a Cache Server
Hardware
bull 8-core 64gb 3Ghz Mac Pro (late 2013) macOS Sierra
(10126)
bull 32tb Pegasus-2 Thunderbolt RAID configured as RAID-5
Triplestore
bull Apache Jena Fuseki 240 provides SPARQL endpoint
bull Apache Tomcat 90 runs custom web application(s)
bull Apache Lucene 36 provides search interface
64
Customizations
bull custom per-data-source JSP web application provides
searchbrowsedownload functionality
bull custom (generic) SPARQL Tag Library provides API for web
apps (available at httpsgithubcomeichmannlod-utilities)
bull custom (generic) Lucene Tag Library provides API for web apps
65
Loading a New Vocabulary
bull download RDF
bull if necessary convert to n-triples (required for GeoNames data for instance)
bull use tdbloader2 to populated triplestore
bull configure Fuseki server(s) with triplestore details
bull create new JSP project in Eclipse
bull write one or more indexer programs that populate Lucene indices and run indexer(s)
bull write searchbrowsedownload application logic using the SPARQL and Lucene tags
bull package project as war
bull deploy to Apache Tomcat server(s)
bull add new service to Apache HTTPD virtual host specification
66
UI Access to Cache Server
httpservicesld4lorgld4l_servicesloc_namejsp
Downloads
68
LoC httpidlocgovdownload (n-triples OR rdf-xml)
OCLC FAST httpwwwoclcorgresearchthemesdata-sciencefastdownloadhtml (n-triples)
GeoNames httpwwwgeonamesorgontologydocumentationhtml (custom format ndash see notes for processing)
AGROVOC httpsaims-faoatlassiannetwikispacesAGVpages2949126Releases (n-triples OR rdf-xml)
NALT httpsagclassnalusdagovdownloadshtml (rdf-xml)
DBpedia httpwikidbpediaorgdownloads-2016-04
Potential Options for Reconciliation
bull VIAF for name reconciliation ndash we are doing some
work with this
bull Wikidata ndash Ive heard that they are working on
Reconciliation issues but havent yet explored in
depth bull Intro Video (3hrs)
bull API Access
bull SPARQL ndash user manual
bull federated queries with other authorities
Doing a google search for linked data reconciliation
returns a large number of articles and presentations
on this concept
Links to Code amp More
bull QA Server - Code for a small app that provides the
Questioning Authority normalization layer
bull Linked Data Authorities - Configurations that can
be used with QA Server
bull LD4L Services - UI access to Cache Server
bull VitroLib - Code for the VitroLib cataloging tool
Normalized Results
[urihttpidworldcatorgfast31622 id31622 labelTwain Mark 1835-1910 urihttpidworldcatorgfast365563 id365563 labelTwain Shania]
[uri httpswsgeonamesorg2162552 id httpswsgeonamesorg2162552 label Ithaca (AU) uri httpswsgeonamesorg4515289 id httpswsgeonamesorg4515289 label Ithaca (US)]
[uri httpaimsfaoorgaosagrovocc_8602 id httpaimsfaoorgaosagrovocc_8602 label acidophilus milk uri httpaimsfaoorgaosagrovocc_16076 id httpaimsfaoorgaosagrovocc_16076 label buffalo milk]
OCLC FAST GeoNames AgroVoc
Second Set of Challenges
5 Reliability amp Efficiency eg server uptime server load
6 Accuracy eg select results based on usage data lexical match
custom weighting other
7 Order Ranking eg How to order a graph
Cache Server Query Process
JSP Query API
Jena-Fuseki
Triplestore
One full setup per authority
LuceneSOLR
Index
Cache Server Query Process
JSP Query API
httpservicesld4lorgld4l_servicesloc_name_batchjspquery=ezra20cornellampmaxRecords=10
Jena-Fuseki
Triplestore
LuceneSOLR
Index
One full setup per authority
Cache Server Query Process
JSP Query API
lucene search for ezra cornell
index built with predicate values
ltskosprefLabelgt
ltskosaltLabelgt
Jena-Fuseki
Triplestore
LuceneSOLR
Index
httpservicesld4lorgld4l_servicesloc_name_batchjspquery=ezra20cornellampmaxRecords=10
One full setup per authority
Cache Server Query Process
JSP Query API
extract search rank
extract URI
Jena-Fuseki
Triplestore
for each result
LuceneSOLR
Index
lucene search for ezra cornell
index built with predicate values
ltskosprefLabelgt
ltskosaltLabelgt
httpservicesld4lorgld4l_servicesloc_name_batchjspquery=ezra20cornellampmaxRecords=10
One full setup per authority
Cache Server Query Process
JSP Query API
sparql query for URI
Jena-Fuseki
Triplestore
LuceneSOLR
Index
extract search rank
extract URI
for each result
lucene search for ezra cornell
index built with predicate values
ltskosprefLabelgt
ltskosaltLabelgt
httpservicesld4lorgld4l_servicesloc_name_batchjspquery=ezra20cornellampmaxRecords=10
One full setup per authority
Cache Server Query Process
JSP Query API
combine all results
Jena-Fuseki
Triplestore
insert search rank in predicate
lthttpvivoweborgontology
corerankgt
LuceneSOLR
Index
sparql query for URI extract search rank
extract URI
for each result
lucene search for ezra cornell
index built with predicate values
ltskosprefLabelgt
ltskosaltLabelgt
httpservicesld4lorgld4l_servicesloc_name_batchjspquery=ezra20cornellampmaxRecords=10
One full setup per authority
UI-QA-Authority
QA ndash normalize RDF returned from an authority
httplocalhost3000qasearchlinked_data
oclc_fastpersonal_nameq=twainampmaximumRecords=2
[urihttpidworldcatorgfast31622id31622
labelTwain Mark 1835-1910
urihttpidworldcatorgfast365563id365563
labelTwain Shania
httpexperimentalworldcatorgfastsearchquery=o
clcpersonalName+22twain22
ampsortKeys=usageampmaximumRecords=2
RDF of
search
results
Active-Triples
LDF Cache
(Marmotta or
Blazegraph) LDF Cache Jena-Fuseki-
Lucene
Cache
Direct Access
of External
Authority
HyraxVitrolib ndash UI for selecting an entry from an authority
search of cache performed via Lucene index
Third Set of Challenges
8 Disambiguation through better context eg expand from just prefLabel tohellip
preLabel altLabel birthdeath dates occupation etc
9 Reconciliation across multiple sources eg match LoC URI to OCLC FAST URI
53
Whatrsquos next
Addressing Architectural Challenges
bull Generalize process for accessing context on the
cache server and in the normalization layer
bull Multi-authority search and reconciliation
bull Address the need for cache refresh
bull Mirrored cache servers
User Experience and Design
bull User-centered Design
bull Observe listen learn design evaluate iterate
bull Iteratively design and evaluate UI for lookupauthorities
with catalogers
bull Search result rankingorderingfiltering for catalogers
bull Additional UI platforms eg FOLIO
56
Questions
httptinyurlcomld4l-auth-access
Appendix for Challenges 1-4
Challenge 1 Documentation
58
LoC httpidlocgovtechcenter
C Harlow notes on reconciling LoC - httpsgithubcomcmh2166lc-reconcile
OCLC FAST
httpswwwoclcorgdeveloperdevelopweb-servicesfast-apilinked-dataenhtml
GeoNames
httpwwwgeonamesorgexportgeonames-searchhtml
AGROVOC httpaimsfaoorgvest-registryvocabulariesagrovoc-multilingual-agricultural-thesaurus
swagger config httpsgithubcomNatLibFiSkosmosblobmasterswaggerjson
NALT
httpsagclassnalusdagov
DBpedia httpwikidbpediaorgOnlineAccess1220Public20Faceted20Web20Service20Inter
face
Challenge 2 Linked Data Access API
59
for Search Query for Term Fetch
LoC not supported URI
OCLC FAST httpexperimentalworldcatorgfastsearchq
uery=subauth+all+22query22ampsortK
eys=usageampmaximumRecords=maximumR
ecords
URI
GeoNames httpapigeonamesorgsearchq=queryamp
maxRows=maxRowsampusername=userna
meamptype=rdf
URI
AGROVOC httpartemideartuniroma2it8081agrovocr
estv1searchquery=queryamplang=lang
httpartemideartuniroma2it8081agrovo
crestv1datauri=httpaimsfaoorgaosa
grovocterm_id
NALT httpskosmoslibrarycornelledurestv1nalt
searchquery=queryamplang=lang
httpskosmoslibrarycornelledurestv1na
ltdatauri=term_uri
DBpedia
Challenge 3 Varying Results Formats
60
for Search Query for Term Fetch
LoC not supported rdf-xml
OCLC FAST rdf-xml rdf-xml
GeoNames rdf-xml rdf-xml
AGROVOC json-ld rdf-xml json-ld turtle
NALT json-ld rdf-xml json-ld turtle
DBpedia
Challenge 4 Varying Ontologies
61
Primary Ontology Flat vs Navigation
required
LoC madsrdf
SKOS
navigation required
OCLC FAST schemaorg
SKOS
flat
GeoNames geonames flat
hierarchical
AGROVOC SKOS flat
hierarchical
NALT SKOS flat
hierarchical
DBpedia dbpedia flat
Configurations for Questioning Authority
62
LoC httpsgithubcomld4l-
labslinked_data_authoritiestreemasterqa_locconfigauthoritieslinked_dat
a
OCLC FAST httpsgithubcomld4l-
labslinked_data_authoritiestreemasterqa_oclcfastconfigauthoritieslinked
_data
GeoNames httpsgithubcomld4l-
labslinked_data_authoritiestreemasterqa_geonamesconfigauthoritieslink
ed_data
AGROVOC httpsgithubcomld4l-
labslinked_data_authoritiestreemasterqa_agrovocconfigauthoritieslinked
_data
NALT httpsgithubcomld4l-
labslinked_data_authoritiestreemasterqa_naltconfigauthoritieslinked_dat
a
DBpedia httpsgithubcomld4l-
labslinked_data_authoritiestreemasterqa_dbpediaconfigauthoritieslinked
_data
Appendix for Challenges 5-7
Creating a Cache Server
Hardware
bull 8-core 64gb 3Ghz Mac Pro (late 2013) macOS Sierra
(10126)
bull 32tb Pegasus-2 Thunderbolt RAID configured as RAID-5
Triplestore
bull Apache Jena Fuseki 240 provides SPARQL endpoint
bull Apache Tomcat 90 runs custom web application(s)
bull Apache Lucene 36 provides search interface
64
Customizations
bull custom per-data-source JSP web application provides
searchbrowsedownload functionality
bull custom (generic) SPARQL Tag Library provides API for web
apps (available at httpsgithubcomeichmannlod-utilities)
bull custom (generic) Lucene Tag Library provides API for web apps
65
Loading a New Vocabulary
bull download RDF
bull if necessary convert to n-triples (required for GeoNames data for instance)
bull use tdbloader2 to populated triplestore
bull configure Fuseki server(s) with triplestore details
bull create new JSP project in Eclipse
bull write one or more indexer programs that populate Lucene indices and run indexer(s)
bull write searchbrowsedownload application logic using the SPARQL and Lucene tags
bull package project as war
bull deploy to Apache Tomcat server(s)
bull add new service to Apache HTTPD virtual host specification
66
UI Access to Cache Server
httpservicesld4lorgld4l_servicesloc_namejsp
Downloads
68
LoC httpidlocgovdownload (n-triples OR rdf-xml)
OCLC FAST httpwwwoclcorgresearchthemesdata-sciencefastdownloadhtml (n-triples)
GeoNames httpwwwgeonamesorgontologydocumentationhtml (custom format ndash see notes for processing)
AGROVOC httpsaims-faoatlassiannetwikispacesAGVpages2949126Releases (n-triples OR rdf-xml)
NALT httpsagclassnalusdagovdownloadshtml (rdf-xml)
DBpedia httpwikidbpediaorgdownloads-2016-04
Potential Options for Reconciliation
bull VIAF for name reconciliation ndash we are doing some
work with this
bull Wikidata ndash Ive heard that they are working on
Reconciliation issues but havent yet explored in
depth bull Intro Video (3hrs)
bull API Access
bull SPARQL ndash user manual
bull federated queries with other authorities
Doing a google search for linked data reconciliation
returns a large number of articles and presentations
on this concept
Links to Code amp More
bull QA Server - Code for a small app that provides the
Questioning Authority normalization layer
bull Linked Data Authorities - Configurations that can
be used with QA Server
bull LD4L Services - UI access to Cache Server
bull VitroLib - Code for the VitroLib cataloging tool
Second Set of Challenges
5 Reliability amp Efficiency eg server uptime server load
6 Accuracy eg select results based on usage data lexical match
custom weighting other
7 Order Ranking eg How to order a graph
Cache Server Query Process
JSP Query API
Jena-Fuseki
Triplestore
One full setup per authority
LuceneSOLR
Index
Cache Server Query Process
JSP Query API
httpservicesld4lorgld4l_servicesloc_name_batchjspquery=ezra20cornellampmaxRecords=10
Jena-Fuseki
Triplestore
LuceneSOLR
Index
One full setup per authority
Cache Server Query Process
JSP Query API
lucene search for ezra cornell
index built with predicate values
ltskosprefLabelgt
ltskosaltLabelgt
Jena-Fuseki
Triplestore
LuceneSOLR
Index
httpservicesld4lorgld4l_servicesloc_name_batchjspquery=ezra20cornellampmaxRecords=10
One full setup per authority
Cache Server Query Process
JSP Query API
extract search rank
extract URI
Jena-Fuseki
Triplestore
for each result
LuceneSOLR
Index
lucene search for ezra cornell
index built with predicate values
ltskosprefLabelgt
ltskosaltLabelgt
httpservicesld4lorgld4l_servicesloc_name_batchjspquery=ezra20cornellampmaxRecords=10
One full setup per authority
Cache Server Query Process
JSP Query API
sparql query for URI
Jena-Fuseki
Triplestore
LuceneSOLR
Index
extract search rank
extract URI
for each result
lucene search for ezra cornell
index built with predicate values
ltskosprefLabelgt
ltskosaltLabelgt
httpservicesld4lorgld4l_servicesloc_name_batchjspquery=ezra20cornellampmaxRecords=10
One full setup per authority
Cache Server Query Process
JSP Query API
combine all results
Jena-Fuseki
Triplestore
insert search rank in predicate
lthttpvivoweborgontology
corerankgt
LuceneSOLR
Index
sparql query for URI extract search rank
extract URI
for each result
lucene search for ezra cornell
index built with predicate values
ltskosprefLabelgt
ltskosaltLabelgt
httpservicesld4lorgld4l_servicesloc_name_batchjspquery=ezra20cornellampmaxRecords=10
One full setup per authority
UI-QA-Authority
QA ndash normalize RDF returned from an authority
httplocalhost3000qasearchlinked_data
oclc_fastpersonal_nameq=twainampmaximumRecords=2
[urihttpidworldcatorgfast31622id31622
labelTwain Mark 1835-1910
urihttpidworldcatorgfast365563id365563
labelTwain Shania
httpexperimentalworldcatorgfastsearchquery=o
clcpersonalName+22twain22
ampsortKeys=usageampmaximumRecords=2
RDF of
search
results
Active-Triples
LDF Cache
(Marmotta or
Blazegraph) LDF Cache Jena-Fuseki-
Lucene
Cache
Direct Access
of External
Authority
HyraxVitrolib ndash UI for selecting an entry from an authority
search of cache performed via Lucene index
Third Set of Challenges
8 Disambiguation through better context eg expand from just prefLabel tohellip
preLabel altLabel birthdeath dates occupation etc
9 Reconciliation across multiple sources eg match LoC URI to OCLC FAST URI
53
Whatrsquos next
Addressing Architectural Challenges
bull Generalize process for accessing context on the
cache server and in the normalization layer
bull Multi-authority search and reconciliation
bull Address the need for cache refresh
bull Mirrored cache servers
User Experience and Design
bull User-centered Design
bull Observe listen learn design evaluate iterate
bull Iteratively design and evaluate UI for lookupauthorities
with catalogers
bull Search result rankingorderingfiltering for catalogers
bull Additional UI platforms eg FOLIO
56
Questions
httptinyurlcomld4l-auth-access
Appendix for Challenges 1-4
Challenge 1 Documentation
58
LoC httpidlocgovtechcenter
C Harlow notes on reconciling LoC - httpsgithubcomcmh2166lc-reconcile
OCLC FAST
httpswwwoclcorgdeveloperdevelopweb-servicesfast-apilinked-dataenhtml
GeoNames
httpwwwgeonamesorgexportgeonames-searchhtml
AGROVOC httpaimsfaoorgvest-registryvocabulariesagrovoc-multilingual-agricultural-thesaurus
swagger config httpsgithubcomNatLibFiSkosmosblobmasterswaggerjson
NALT
httpsagclassnalusdagov
DBpedia httpwikidbpediaorgOnlineAccess1220Public20Faceted20Web20Service20Inter
face
Challenge 2 Linked Data Access API
59
for Search Query for Term Fetch
LoC not supported URI
OCLC FAST httpexperimentalworldcatorgfastsearchq
uery=subauth+all+22query22ampsortK
eys=usageampmaximumRecords=maximumR
ecords
URI
GeoNames httpapigeonamesorgsearchq=queryamp
maxRows=maxRowsampusername=userna
meamptype=rdf
URI
AGROVOC httpartemideartuniroma2it8081agrovocr
estv1searchquery=queryamplang=lang
httpartemideartuniroma2it8081agrovo
crestv1datauri=httpaimsfaoorgaosa
grovocterm_id
NALT httpskosmoslibrarycornelledurestv1nalt
searchquery=queryamplang=lang
httpskosmoslibrarycornelledurestv1na
ltdatauri=term_uri
DBpedia
Challenge 3 Varying Results Formats
60
for Search Query for Term Fetch
LoC not supported rdf-xml
OCLC FAST rdf-xml rdf-xml
GeoNames rdf-xml rdf-xml
AGROVOC json-ld rdf-xml json-ld turtle
NALT json-ld rdf-xml json-ld turtle
DBpedia
Challenge 4 Varying Ontologies
61
Primary Ontology Flat vs Navigation
required
LoC madsrdf
SKOS
navigation required
OCLC FAST schemaorg
SKOS
flat
GeoNames geonames flat
hierarchical
AGROVOC SKOS flat
hierarchical
NALT SKOS flat
hierarchical
DBpedia dbpedia flat
Configurations for Questioning Authority
62
LoC httpsgithubcomld4l-
labslinked_data_authoritiestreemasterqa_locconfigauthoritieslinked_dat
a
OCLC FAST httpsgithubcomld4l-
labslinked_data_authoritiestreemasterqa_oclcfastconfigauthoritieslinked
_data
GeoNames httpsgithubcomld4l-
labslinked_data_authoritiestreemasterqa_geonamesconfigauthoritieslink
ed_data
AGROVOC httpsgithubcomld4l-
labslinked_data_authoritiestreemasterqa_agrovocconfigauthoritieslinked
_data
NALT httpsgithubcomld4l-
labslinked_data_authoritiestreemasterqa_naltconfigauthoritieslinked_dat
a
DBpedia httpsgithubcomld4l-
labslinked_data_authoritiestreemasterqa_dbpediaconfigauthoritieslinked
_data
Appendix for Challenges 5-7
Creating a Cache Server
Hardware
bull 8-core 64gb 3Ghz Mac Pro (late 2013) macOS Sierra
(10126)
bull 32tb Pegasus-2 Thunderbolt RAID configured as RAID-5
Triplestore
bull Apache Jena Fuseki 240 provides SPARQL endpoint
bull Apache Tomcat 90 runs custom web application(s)
bull Apache Lucene 36 provides search interface
64
Customizations
bull custom per-data-source JSP web application provides
searchbrowsedownload functionality
bull custom (generic) SPARQL Tag Library provides API for web
apps (available at httpsgithubcomeichmannlod-utilities)
bull custom (generic) Lucene Tag Library provides API for web apps
65
Loading a New Vocabulary
bull download RDF
bull if necessary convert to n-triples (required for GeoNames data for instance)
bull use tdbloader2 to populated triplestore
bull configure Fuseki server(s) with triplestore details
bull create new JSP project in Eclipse
bull write one or more indexer programs that populate Lucene indices and run indexer(s)
bull write searchbrowsedownload application logic using the SPARQL and Lucene tags
bull package project as war
bull deploy to Apache Tomcat server(s)
bull add new service to Apache HTTPD virtual host specification
66
UI Access to Cache Server
httpservicesld4lorgld4l_servicesloc_namejsp
Downloads
68
LoC httpidlocgovdownload (n-triples OR rdf-xml)
OCLC FAST httpwwwoclcorgresearchthemesdata-sciencefastdownloadhtml (n-triples)
GeoNames httpwwwgeonamesorgontologydocumentationhtml (custom format ndash see notes for processing)
AGROVOC httpsaims-faoatlassiannetwikispacesAGVpages2949126Releases (n-triples OR rdf-xml)
NALT httpsagclassnalusdagovdownloadshtml (rdf-xml)
DBpedia httpwikidbpediaorgdownloads-2016-04
Potential Options for Reconciliation
bull VIAF for name reconciliation ndash we are doing some
work with this
bull Wikidata ndash Ive heard that they are working on
Reconciliation issues but havent yet explored in
depth bull Intro Video (3hrs)
bull API Access
bull SPARQL ndash user manual
bull federated queries with other authorities
Doing a google search for linked data reconciliation
returns a large number of articles and presentations
on this concept
Links to Code amp More
bull QA Server - Code for a small app that provides the
Questioning Authority normalization layer
bull Linked Data Authorities - Configurations that can
be used with QA Server
bull LD4L Services - UI access to Cache Server
bull VitroLib - Code for the VitroLib cataloging tool
Cache Server Query Process
JSP Query API
Jena-Fuseki
Triplestore
One full setup per authority
LuceneSOLR
Index
Cache Server Query Process
JSP Query API
httpservicesld4lorgld4l_servicesloc_name_batchjspquery=ezra20cornellampmaxRecords=10
Jena-Fuseki
Triplestore
LuceneSOLR
Index
One full setup per authority
Cache Server Query Process
JSP Query API
lucene search for ezra cornell
index built with predicate values
ltskosprefLabelgt
ltskosaltLabelgt
Jena-Fuseki
Triplestore
LuceneSOLR
Index
httpservicesld4lorgld4l_servicesloc_name_batchjspquery=ezra20cornellampmaxRecords=10
One full setup per authority
Cache Server Query Process
JSP Query API
extract search rank
extract URI
Jena-Fuseki
Triplestore
for each result
LuceneSOLR
Index
lucene search for ezra cornell
index built with predicate values
ltskosprefLabelgt
ltskosaltLabelgt
httpservicesld4lorgld4l_servicesloc_name_batchjspquery=ezra20cornellampmaxRecords=10
One full setup per authority
Cache Server Query Process
JSP Query API
sparql query for URI
Jena-Fuseki
Triplestore
LuceneSOLR
Index
extract search rank
extract URI
for each result
lucene search for ezra cornell
index built with predicate values
ltskosprefLabelgt
ltskosaltLabelgt
httpservicesld4lorgld4l_servicesloc_name_batchjspquery=ezra20cornellampmaxRecords=10
One full setup per authority
Cache Server Query Process
JSP Query API
combine all results
Jena-Fuseki
Triplestore
insert search rank in predicate
lthttpvivoweborgontology
corerankgt
LuceneSOLR
Index
sparql query for URI extract search rank
extract URI
for each result
lucene search for ezra cornell
index built with predicate values
ltskosprefLabelgt
ltskosaltLabelgt
httpservicesld4lorgld4l_servicesloc_name_batchjspquery=ezra20cornellampmaxRecords=10
One full setup per authority
UI-QA-Authority
QA ndash normalize RDF returned from an authority
httplocalhost3000qasearchlinked_data
oclc_fastpersonal_nameq=twainampmaximumRecords=2
[urihttpidworldcatorgfast31622id31622
labelTwain Mark 1835-1910
urihttpidworldcatorgfast365563id365563
labelTwain Shania
httpexperimentalworldcatorgfastsearchquery=o
clcpersonalName+22twain22
ampsortKeys=usageampmaximumRecords=2
RDF of
search
results
Active-Triples
LDF Cache
(Marmotta or
Blazegraph) LDF Cache Jena-Fuseki-
Lucene
Cache
Direct Access
of External
Authority
HyraxVitrolib ndash UI for selecting an entry from an authority
search of cache performed via Lucene index
Third Set of Challenges
8 Disambiguation through better context eg expand from just prefLabel tohellip
preLabel altLabel birthdeath dates occupation etc
9 Reconciliation across multiple sources eg match LoC URI to OCLC FAST URI
53
Whatrsquos next
Addressing Architectural Challenges
bull Generalize process for accessing context on the
cache server and in the normalization layer
bull Multi-authority search and reconciliation
bull Address the need for cache refresh
bull Mirrored cache servers
User Experience and Design
bull User-centered Design
bull Observe listen learn design evaluate iterate
bull Iteratively design and evaluate UI for lookupauthorities
with catalogers
bull Search result rankingorderingfiltering for catalogers
bull Additional UI platforms eg FOLIO
56
Questions
httptinyurlcomld4l-auth-access
Appendix for Challenges 1-4
Challenge 1 Documentation
58
LoC httpidlocgovtechcenter
C Harlow notes on reconciling LoC - httpsgithubcomcmh2166lc-reconcile
OCLC FAST
httpswwwoclcorgdeveloperdevelopweb-servicesfast-apilinked-dataenhtml
GeoNames
httpwwwgeonamesorgexportgeonames-searchhtml
AGROVOC httpaimsfaoorgvest-registryvocabulariesagrovoc-multilingual-agricultural-thesaurus
swagger config httpsgithubcomNatLibFiSkosmosblobmasterswaggerjson
NALT
httpsagclassnalusdagov
DBpedia httpwikidbpediaorgOnlineAccess1220Public20Faceted20Web20Service20Inter
face
Challenge 2 Linked Data Access API
59
for Search Query for Term Fetch
LoC not supported URI
OCLC FAST httpexperimentalworldcatorgfastsearchq
uery=subauth+all+22query22ampsortK
eys=usageampmaximumRecords=maximumR
ecords
URI
GeoNames httpapigeonamesorgsearchq=queryamp
maxRows=maxRowsampusername=userna
meamptype=rdf
URI
AGROVOC httpartemideartuniroma2it8081agrovocr
estv1searchquery=queryamplang=lang
httpartemideartuniroma2it8081agrovo
crestv1datauri=httpaimsfaoorgaosa
grovocterm_id
NALT httpskosmoslibrarycornelledurestv1nalt
searchquery=queryamplang=lang
httpskosmoslibrarycornelledurestv1na
ltdatauri=term_uri
DBpedia
Challenge 3 Varying Results Formats
60
for Search Query for Term Fetch
LoC not supported rdf-xml
OCLC FAST rdf-xml rdf-xml
GeoNames rdf-xml rdf-xml
AGROVOC json-ld rdf-xml json-ld turtle
NALT json-ld rdf-xml json-ld turtle
DBpedia
Challenge 4 Varying Ontologies
61
Primary Ontology Flat vs Navigation
required
LoC madsrdf
SKOS
navigation required
OCLC FAST schemaorg
SKOS
flat
GeoNames geonames flat
hierarchical
AGROVOC SKOS flat
hierarchical
NALT SKOS flat
hierarchical
DBpedia dbpedia flat
Configurations for Questioning Authority
62
LoC httpsgithubcomld4l-
labslinked_data_authoritiestreemasterqa_locconfigauthoritieslinked_dat
a
OCLC FAST httpsgithubcomld4l-
labslinked_data_authoritiestreemasterqa_oclcfastconfigauthoritieslinked
_data
GeoNames httpsgithubcomld4l-
labslinked_data_authoritiestreemasterqa_geonamesconfigauthoritieslink
ed_data
AGROVOC httpsgithubcomld4l-
labslinked_data_authoritiestreemasterqa_agrovocconfigauthoritieslinked
_data
NALT httpsgithubcomld4l-
labslinked_data_authoritiestreemasterqa_naltconfigauthoritieslinked_dat
a
DBpedia httpsgithubcomld4l-
labslinked_data_authoritiestreemasterqa_dbpediaconfigauthoritieslinked
_data
Appendix for Challenges 5-7
Creating a Cache Server
Hardware
bull 8-core 64gb 3Ghz Mac Pro (late 2013) macOS Sierra
(10126)
bull 32tb Pegasus-2 Thunderbolt RAID configured as RAID-5
Triplestore
bull Apache Jena Fuseki 240 provides SPARQL endpoint
bull Apache Tomcat 90 runs custom web application(s)
bull Apache Lucene 36 provides search interface
64
Customizations
bull custom per-data-source JSP web application provides
searchbrowsedownload functionality
bull custom (generic) SPARQL Tag Library provides API for web
apps (available at httpsgithubcomeichmannlod-utilities)
bull custom (generic) Lucene Tag Library provides API for web apps
65
Loading a New Vocabulary
bull download RDF
bull if necessary convert to n-triples (required for GeoNames data for instance)
bull use tdbloader2 to populated triplestore
bull configure Fuseki server(s) with triplestore details
bull create new JSP project in Eclipse
bull write one or more indexer programs that populate Lucene indices and run indexer(s)
bull write searchbrowsedownload application logic using the SPARQL and Lucene tags
bull package project as war
bull deploy to Apache Tomcat server(s)
bull add new service to Apache HTTPD virtual host specification
66
UI Access to Cache Server
httpservicesld4lorgld4l_servicesloc_namejsp
Downloads
68
LoC httpidlocgovdownload (n-triples OR rdf-xml)
OCLC FAST httpwwwoclcorgresearchthemesdata-sciencefastdownloadhtml (n-triples)
GeoNames httpwwwgeonamesorgontologydocumentationhtml (custom format ndash see notes for processing)
AGROVOC httpsaims-faoatlassiannetwikispacesAGVpages2949126Releases (n-triples OR rdf-xml)
NALT httpsagclassnalusdagovdownloadshtml (rdf-xml)
DBpedia httpwikidbpediaorgdownloads-2016-04
Potential Options for Reconciliation
bull VIAF for name reconciliation ndash we are doing some
work with this
bull Wikidata ndash Ive heard that they are working on
Reconciliation issues but havent yet explored in
depth bull Intro Video (3hrs)
bull API Access
bull SPARQL ndash user manual
bull federated queries with other authorities
Doing a google search for linked data reconciliation
returns a large number of articles and presentations
on this concept
Links to Code amp More
bull QA Server - Code for a small app that provides the
Questioning Authority normalization layer
bull Linked Data Authorities - Configurations that can
be used with QA Server
bull LD4L Services - UI access to Cache Server
bull VitroLib - Code for the VitroLib cataloging tool
Cache Server Query Process
JSP Query API
httpservicesld4lorgld4l_servicesloc_name_batchjspquery=ezra20cornellampmaxRecords=10
Jena-Fuseki
Triplestore
LuceneSOLR
Index
One full setup per authority
Cache Server Query Process
JSP Query API
lucene search for ezra cornell
index built with predicate values
ltskosprefLabelgt
ltskosaltLabelgt
Jena-Fuseki
Triplestore
LuceneSOLR
Index
httpservicesld4lorgld4l_servicesloc_name_batchjspquery=ezra20cornellampmaxRecords=10
One full setup per authority
Cache Server Query Process
JSP Query API
extract search rank
extract URI
Jena-Fuseki
Triplestore
for each result
LuceneSOLR
Index
lucene search for ezra cornell
index built with predicate values
ltskosprefLabelgt
ltskosaltLabelgt
httpservicesld4lorgld4l_servicesloc_name_batchjspquery=ezra20cornellampmaxRecords=10
One full setup per authority
Cache Server Query Process
JSP Query API
sparql query for URI
Jena-Fuseki
Triplestore
LuceneSOLR
Index
extract search rank
extract URI
for each result
lucene search for ezra cornell
index built with predicate values
ltskosprefLabelgt
ltskosaltLabelgt
httpservicesld4lorgld4l_servicesloc_name_batchjspquery=ezra20cornellampmaxRecords=10
One full setup per authority
Cache Server Query Process
JSP Query API
combine all results
Jena-Fuseki
Triplestore
insert search rank in predicate
lthttpvivoweborgontology
corerankgt
LuceneSOLR
Index
sparql query for URI extract search rank
extract URI
for each result
lucene search for ezra cornell
index built with predicate values
ltskosprefLabelgt
ltskosaltLabelgt
httpservicesld4lorgld4l_servicesloc_name_batchjspquery=ezra20cornellampmaxRecords=10
One full setup per authority
UI-QA-Authority
QA ndash normalize RDF returned from an authority
httplocalhost3000qasearchlinked_data
oclc_fastpersonal_nameq=twainampmaximumRecords=2
[urihttpidworldcatorgfast31622id31622
labelTwain Mark 1835-1910
urihttpidworldcatorgfast365563id365563
labelTwain Shania
httpexperimentalworldcatorgfastsearchquery=o
clcpersonalName+22twain22
ampsortKeys=usageampmaximumRecords=2
RDF of
search
results
Active-Triples
LDF Cache
(Marmotta or
Blazegraph) LDF Cache Jena-Fuseki-
Lucene
Cache
Direct Access
of External
Authority
HyraxVitrolib ndash UI for selecting an entry from an authority
search of cache performed via Lucene index
Third Set of Challenges
8 Disambiguation through better context eg expand from just prefLabel tohellip
preLabel altLabel birthdeath dates occupation etc
9 Reconciliation across multiple sources eg match LoC URI to OCLC FAST URI
53
Whatrsquos next
Addressing Architectural Challenges
bull Generalize process for accessing context on the
cache server and in the normalization layer
bull Multi-authority search and reconciliation
bull Address the need for cache refresh
bull Mirrored cache servers
User Experience and Design
bull User-centered Design
bull Observe listen learn design evaluate iterate
bull Iteratively design and evaluate UI for lookupauthorities
with catalogers
bull Search result rankingorderingfiltering for catalogers
bull Additional UI platforms eg FOLIO
56
Questions
httptinyurlcomld4l-auth-access
Appendix for Challenges 1-4
Challenge 1 Documentation
58
LoC httpidlocgovtechcenter
C Harlow notes on reconciling LoC - httpsgithubcomcmh2166lc-reconcile
OCLC FAST
httpswwwoclcorgdeveloperdevelopweb-servicesfast-apilinked-dataenhtml
GeoNames
httpwwwgeonamesorgexportgeonames-searchhtml
AGROVOC httpaimsfaoorgvest-registryvocabulariesagrovoc-multilingual-agricultural-thesaurus
swagger config httpsgithubcomNatLibFiSkosmosblobmasterswaggerjson
NALT
httpsagclassnalusdagov
DBpedia httpwikidbpediaorgOnlineAccess1220Public20Faceted20Web20Service20Inter
face
Challenge 2 Linked Data Access API
59
for Search Query for Term Fetch
LoC not supported URI
OCLC FAST httpexperimentalworldcatorgfastsearchq
uery=subauth+all+22query22ampsortK
eys=usageampmaximumRecords=maximumR
ecords
URI
GeoNames httpapigeonamesorgsearchq=queryamp
maxRows=maxRowsampusername=userna
meamptype=rdf
URI
AGROVOC httpartemideartuniroma2it8081agrovocr
estv1searchquery=queryamplang=lang
httpartemideartuniroma2it8081agrovo
crestv1datauri=httpaimsfaoorgaosa
grovocterm_id
NALT httpskosmoslibrarycornelledurestv1nalt
searchquery=queryamplang=lang
httpskosmoslibrarycornelledurestv1na
ltdatauri=term_uri
DBpedia
Challenge 3 Varying Results Formats
60
for Search Query for Term Fetch
LoC not supported rdf-xml
OCLC FAST rdf-xml rdf-xml
GeoNames rdf-xml rdf-xml
AGROVOC json-ld rdf-xml json-ld turtle
NALT json-ld rdf-xml json-ld turtle
DBpedia
Challenge 4 Varying Ontologies
61
Primary Ontology Flat vs Navigation
required
LoC madsrdf
SKOS
navigation required
OCLC FAST schemaorg
SKOS
flat
GeoNames geonames flat
hierarchical
AGROVOC SKOS flat
hierarchical
NALT SKOS flat
hierarchical
DBpedia dbpedia flat
Configurations for Questioning Authority
62
LoC httpsgithubcomld4l-
labslinked_data_authoritiestreemasterqa_locconfigauthoritieslinked_dat
a
OCLC FAST httpsgithubcomld4l-
labslinked_data_authoritiestreemasterqa_oclcfastconfigauthoritieslinked
_data
GeoNames httpsgithubcomld4l-
labslinked_data_authoritiestreemasterqa_geonamesconfigauthoritieslink
ed_data
AGROVOC httpsgithubcomld4l-
labslinked_data_authoritiestreemasterqa_agrovocconfigauthoritieslinked
_data
NALT httpsgithubcomld4l-
labslinked_data_authoritiestreemasterqa_naltconfigauthoritieslinked_dat
a
DBpedia httpsgithubcomld4l-
labslinked_data_authoritiestreemasterqa_dbpediaconfigauthoritieslinked
_data
Appendix for Challenges 5-7
Creating a Cache Server
Hardware
bull 8-core 64gb 3Ghz Mac Pro (late 2013) macOS Sierra
(10126)
bull 32tb Pegasus-2 Thunderbolt RAID configured as RAID-5
Triplestore
bull Apache Jena Fuseki 240 provides SPARQL endpoint
bull Apache Tomcat 90 runs custom web application(s)
bull Apache Lucene 36 provides search interface
64
Customizations
bull custom per-data-source JSP web application provides
searchbrowsedownload functionality
bull custom (generic) SPARQL Tag Library provides API for web
apps (available at httpsgithubcomeichmannlod-utilities)
bull custom (generic) Lucene Tag Library provides API for web apps
65
Loading a New Vocabulary
bull download RDF
bull if necessary convert to n-triples (required for GeoNames data for instance)
bull use tdbloader2 to populated triplestore
bull configure Fuseki server(s) with triplestore details
bull create new JSP project in Eclipse
bull write one or more indexer programs that populate Lucene indices and run indexer(s)
bull write searchbrowsedownload application logic using the SPARQL and Lucene tags
bull package project as war
bull deploy to Apache Tomcat server(s)
bull add new service to Apache HTTPD virtual host specification
66
UI Access to Cache Server
httpservicesld4lorgld4l_servicesloc_namejsp
Downloads
68
LoC httpidlocgovdownload (n-triples OR rdf-xml)
OCLC FAST httpwwwoclcorgresearchthemesdata-sciencefastdownloadhtml (n-triples)
GeoNames httpwwwgeonamesorgontologydocumentationhtml (custom format ndash see notes for processing)
AGROVOC httpsaims-faoatlassiannetwikispacesAGVpages2949126Releases (n-triples OR rdf-xml)
NALT httpsagclassnalusdagovdownloadshtml (rdf-xml)
DBpedia httpwikidbpediaorgdownloads-2016-04
Potential Options for Reconciliation
bull VIAF for name reconciliation ndash we are doing some
work with this
bull Wikidata ndash Ive heard that they are working on
Reconciliation issues but havent yet explored in
depth bull Intro Video (3hrs)
bull API Access
bull SPARQL ndash user manual
bull federated queries with other authorities
Doing a google search for linked data reconciliation
returns a large number of articles and presentations
on this concept
Links to Code amp More
bull QA Server - Code for a small app that provides the
Questioning Authority normalization layer
bull Linked Data Authorities - Configurations that can
be used with QA Server
bull LD4L Services - UI access to Cache Server
bull VitroLib - Code for the VitroLib cataloging tool
Cache Server Query Process
JSP Query API
lucene search for ezra cornell
index built with predicate values
ltskosprefLabelgt
ltskosaltLabelgt
Jena-Fuseki
Triplestore
LuceneSOLR
Index
httpservicesld4lorgld4l_servicesloc_name_batchjspquery=ezra20cornellampmaxRecords=10
One full setup per authority
Cache Server Query Process
JSP Query API
extract search rank
extract URI
Jena-Fuseki
Triplestore
for each result
LuceneSOLR
Index
lucene search for ezra cornell
index built with predicate values
ltskosprefLabelgt
ltskosaltLabelgt
httpservicesld4lorgld4l_servicesloc_name_batchjspquery=ezra20cornellampmaxRecords=10
One full setup per authority
Cache Server Query Process
JSP Query API
sparql query for URI
Jena-Fuseki
Triplestore
LuceneSOLR
Index
extract search rank
extract URI
for each result
lucene search for ezra cornell
index built with predicate values
ltskosprefLabelgt
ltskosaltLabelgt
httpservicesld4lorgld4l_servicesloc_name_batchjspquery=ezra20cornellampmaxRecords=10
One full setup per authority
Cache Server Query Process
JSP Query API
combine all results
Jena-Fuseki
Triplestore
insert search rank in predicate
lthttpvivoweborgontology
corerankgt
LuceneSOLR
Index
sparql query for URI extract search rank
extract URI
for each result
lucene search for ezra cornell
index built with predicate values
ltskosprefLabelgt
ltskosaltLabelgt
httpservicesld4lorgld4l_servicesloc_name_batchjspquery=ezra20cornellampmaxRecords=10
One full setup per authority
UI-QA-Authority
QA ndash normalize RDF returned from an authority
httplocalhost3000qasearchlinked_data
oclc_fastpersonal_nameq=twainampmaximumRecords=2
[urihttpidworldcatorgfast31622id31622
labelTwain Mark 1835-1910
urihttpidworldcatorgfast365563id365563
labelTwain Shania
httpexperimentalworldcatorgfastsearchquery=o
clcpersonalName+22twain22
ampsortKeys=usageampmaximumRecords=2
RDF of
search
results
Active-Triples
LDF Cache
(Marmotta or
Blazegraph) LDF Cache Jena-Fuseki-
Lucene
Cache
Direct Access
of External
Authority
HyraxVitrolib ndash UI for selecting an entry from an authority
search of cache performed via Lucene index
Third Set of Challenges
8 Disambiguation through better context eg expand from just prefLabel tohellip
preLabel altLabel birthdeath dates occupation etc
9 Reconciliation across multiple sources eg match LoC URI to OCLC FAST URI
53
Whatrsquos next
Addressing Architectural Challenges
bull Generalize process for accessing context on the
cache server and in the normalization layer
bull Multi-authority search and reconciliation
bull Address the need for cache refresh
bull Mirrored cache servers
User Experience and Design
bull User-centered Design
bull Observe listen learn design evaluate iterate
bull Iteratively design and evaluate UI for lookupauthorities
with catalogers
bull Search result rankingorderingfiltering for catalogers
bull Additional UI platforms eg FOLIO
56
Questions
httptinyurlcomld4l-auth-access
Appendix for Challenges 1-4
Challenge 1 Documentation
58
LoC httpidlocgovtechcenter
C Harlow notes on reconciling LoC - httpsgithubcomcmh2166lc-reconcile
OCLC FAST
httpswwwoclcorgdeveloperdevelopweb-servicesfast-apilinked-dataenhtml
GeoNames
httpwwwgeonamesorgexportgeonames-searchhtml
AGROVOC httpaimsfaoorgvest-registryvocabulariesagrovoc-multilingual-agricultural-thesaurus
swagger config httpsgithubcomNatLibFiSkosmosblobmasterswaggerjson
NALT
httpsagclassnalusdagov
DBpedia httpwikidbpediaorgOnlineAccess1220Public20Faceted20Web20Service20Inter
face
Challenge 2 Linked Data Access API
59
for Search Query for Term Fetch
LoC not supported URI
OCLC FAST httpexperimentalworldcatorgfastsearchq
uery=subauth+all+22query22ampsortK
eys=usageampmaximumRecords=maximumR
ecords
URI
GeoNames httpapigeonamesorgsearchq=queryamp
maxRows=maxRowsampusername=userna
meamptype=rdf
URI
AGROVOC httpartemideartuniroma2it8081agrovocr
estv1searchquery=queryamplang=lang
httpartemideartuniroma2it8081agrovo
crestv1datauri=httpaimsfaoorgaosa
grovocterm_id
NALT httpskosmoslibrarycornelledurestv1nalt
searchquery=queryamplang=lang
httpskosmoslibrarycornelledurestv1na
ltdatauri=term_uri
DBpedia
Challenge 3 Varying Results Formats
60
for Search Query for Term Fetch
LoC not supported rdf-xml
OCLC FAST rdf-xml rdf-xml
GeoNames rdf-xml rdf-xml
AGROVOC json-ld rdf-xml json-ld turtle
NALT json-ld rdf-xml json-ld turtle
DBpedia
Challenge 4 Varying Ontologies
61
Primary Ontology Flat vs Navigation
required
LoC madsrdf
SKOS
navigation required
OCLC FAST schemaorg
SKOS
flat
GeoNames geonames flat
hierarchical
AGROVOC SKOS flat
hierarchical
NALT SKOS flat
hierarchical
DBpedia dbpedia flat
Configurations for Questioning Authority
62
LoC httpsgithubcomld4l-
labslinked_data_authoritiestreemasterqa_locconfigauthoritieslinked_dat
a
OCLC FAST httpsgithubcomld4l-
labslinked_data_authoritiestreemasterqa_oclcfastconfigauthoritieslinked
_data
GeoNames httpsgithubcomld4l-
labslinked_data_authoritiestreemasterqa_geonamesconfigauthoritieslink
ed_data
AGROVOC httpsgithubcomld4l-
labslinked_data_authoritiestreemasterqa_agrovocconfigauthoritieslinked
_data
NALT httpsgithubcomld4l-
labslinked_data_authoritiestreemasterqa_naltconfigauthoritieslinked_dat
a
DBpedia httpsgithubcomld4l-
labslinked_data_authoritiestreemasterqa_dbpediaconfigauthoritieslinked
_data
Appendix for Challenges 5-7
Creating a Cache Server
Hardware
bull 8-core 64gb 3Ghz Mac Pro (late 2013) macOS Sierra
(10126)
bull 32tb Pegasus-2 Thunderbolt RAID configured as RAID-5
Triplestore
bull Apache Jena Fuseki 240 provides SPARQL endpoint
bull Apache Tomcat 90 runs custom web application(s)
bull Apache Lucene 36 provides search interface
64
Customizations
bull custom per-data-source JSP web application provides
searchbrowsedownload functionality
bull custom (generic) SPARQL Tag Library provides API for web
apps (available at httpsgithubcomeichmannlod-utilities)
bull custom (generic) Lucene Tag Library provides API for web apps
65
Loading a New Vocabulary
bull download RDF
bull if necessary convert to n-triples (required for GeoNames data for instance)
bull use tdbloader2 to populated triplestore
bull configure Fuseki server(s) with triplestore details
bull create new JSP project in Eclipse
bull write one or more indexer programs that populate Lucene indices and run indexer(s)
bull write searchbrowsedownload application logic using the SPARQL and Lucene tags
bull package project as war
bull deploy to Apache Tomcat server(s)
bull add new service to Apache HTTPD virtual host specification
66
UI Access to Cache Server
httpservicesld4lorgld4l_servicesloc_namejsp
Downloads
68
LoC httpidlocgovdownload (n-triples OR rdf-xml)
OCLC FAST httpwwwoclcorgresearchthemesdata-sciencefastdownloadhtml (n-triples)
GeoNames httpwwwgeonamesorgontologydocumentationhtml (custom format ndash see notes for processing)
AGROVOC httpsaims-faoatlassiannetwikispacesAGVpages2949126Releases (n-triples OR rdf-xml)
NALT httpsagclassnalusdagovdownloadshtml (rdf-xml)
DBpedia httpwikidbpediaorgdownloads-2016-04
Potential Options for Reconciliation
bull VIAF for name reconciliation ndash we are doing some
work with this
bull Wikidata ndash Ive heard that they are working on
Reconciliation issues but havent yet explored in
depth bull Intro Video (3hrs)
bull API Access
bull SPARQL ndash user manual
bull federated queries with other authorities
Doing a google search for linked data reconciliation
returns a large number of articles and presentations
on this concept
Links to Code amp More
bull QA Server - Code for a small app that provides the
Questioning Authority normalization layer
bull Linked Data Authorities - Configurations that can
be used with QA Server
bull LD4L Services - UI access to Cache Server
bull VitroLib - Code for the VitroLib cataloging tool
Cache Server Query Process
JSP Query API
extract search rank
extract URI
Jena-Fuseki
Triplestore
for each result
LuceneSOLR
Index
lucene search for ezra cornell
index built with predicate values
ltskosprefLabelgt
ltskosaltLabelgt
httpservicesld4lorgld4l_servicesloc_name_batchjspquery=ezra20cornellampmaxRecords=10
One full setup per authority
Cache Server Query Process
JSP Query API
sparql query for URI
Jena-Fuseki
Triplestore
LuceneSOLR
Index
extract search rank
extract URI
for each result
lucene search for ezra cornell
index built with predicate values
ltskosprefLabelgt
ltskosaltLabelgt
httpservicesld4lorgld4l_servicesloc_name_batchjspquery=ezra20cornellampmaxRecords=10
One full setup per authority
Cache Server Query Process
JSP Query API
combine all results
Jena-Fuseki
Triplestore
insert search rank in predicate
lthttpvivoweborgontology
corerankgt
LuceneSOLR
Index
sparql query for URI extract search rank
extract URI
for each result
lucene search for ezra cornell
index built with predicate values
ltskosprefLabelgt
ltskosaltLabelgt
httpservicesld4lorgld4l_servicesloc_name_batchjspquery=ezra20cornellampmaxRecords=10
One full setup per authority
UI-QA-Authority
QA ndash normalize RDF returned from an authority
httplocalhost3000qasearchlinked_data
oclc_fastpersonal_nameq=twainampmaximumRecords=2
[urihttpidworldcatorgfast31622id31622
labelTwain Mark 1835-1910
urihttpidworldcatorgfast365563id365563
labelTwain Shania
httpexperimentalworldcatorgfastsearchquery=o
clcpersonalName+22twain22
ampsortKeys=usageampmaximumRecords=2
RDF of
search
results
Active-Triples
LDF Cache
(Marmotta or
Blazegraph) LDF Cache Jena-Fuseki-
Lucene
Cache
Direct Access
of External
Authority
HyraxVitrolib ndash UI for selecting an entry from an authority
search of cache performed via Lucene index
Third Set of Challenges
8 Disambiguation through better context eg expand from just prefLabel tohellip
preLabel altLabel birthdeath dates occupation etc
9 Reconciliation across multiple sources eg match LoC URI to OCLC FAST URI
53
Whatrsquos next
Addressing Architectural Challenges
bull Generalize process for accessing context on the
cache server and in the normalization layer
bull Multi-authority search and reconciliation
bull Address the need for cache refresh
bull Mirrored cache servers
User Experience and Design
bull User-centered Design
bull Observe listen learn design evaluate iterate
bull Iteratively design and evaluate UI for lookupauthorities
with catalogers
bull Search result rankingorderingfiltering for catalogers
bull Additional UI platforms eg FOLIO
56
Questions
httptinyurlcomld4l-auth-access
Appendix for Challenges 1-4
Challenge 1 Documentation
58
LoC httpidlocgovtechcenter
C Harlow notes on reconciling LoC - httpsgithubcomcmh2166lc-reconcile
OCLC FAST
httpswwwoclcorgdeveloperdevelopweb-servicesfast-apilinked-dataenhtml
GeoNames
httpwwwgeonamesorgexportgeonames-searchhtml
AGROVOC httpaimsfaoorgvest-registryvocabulariesagrovoc-multilingual-agricultural-thesaurus
swagger config httpsgithubcomNatLibFiSkosmosblobmasterswaggerjson
NALT
httpsagclassnalusdagov
DBpedia httpwikidbpediaorgOnlineAccess1220Public20Faceted20Web20Service20Inter
face
Challenge 2 Linked Data Access API
59
for Search Query for Term Fetch
LoC not supported URI
OCLC FAST httpexperimentalworldcatorgfastsearchq
uery=subauth+all+22query22ampsortK
eys=usageampmaximumRecords=maximumR
ecords
URI
GeoNames httpapigeonamesorgsearchq=queryamp
maxRows=maxRowsampusername=userna
meamptype=rdf
URI
AGROVOC httpartemideartuniroma2it8081agrovocr
estv1searchquery=queryamplang=lang
httpartemideartuniroma2it8081agrovo
crestv1datauri=httpaimsfaoorgaosa
grovocterm_id
NALT httpskosmoslibrarycornelledurestv1nalt
searchquery=queryamplang=lang
httpskosmoslibrarycornelledurestv1na
ltdatauri=term_uri
DBpedia
Challenge 3 Varying Results Formats
60
for Search Query for Term Fetch
LoC not supported rdf-xml
OCLC FAST rdf-xml rdf-xml
GeoNames rdf-xml rdf-xml
AGROVOC json-ld rdf-xml json-ld turtle
NALT json-ld rdf-xml json-ld turtle
DBpedia
Challenge 4 Varying Ontologies
61
Primary Ontology Flat vs Navigation
required
LoC madsrdf
SKOS
navigation required
OCLC FAST schemaorg
SKOS
flat
GeoNames geonames flat
hierarchical
AGROVOC SKOS flat
hierarchical
NALT SKOS flat
hierarchical
DBpedia dbpedia flat
Configurations for Questioning Authority
62
LoC httpsgithubcomld4l-
labslinked_data_authoritiestreemasterqa_locconfigauthoritieslinked_dat
a
OCLC FAST httpsgithubcomld4l-
labslinked_data_authoritiestreemasterqa_oclcfastconfigauthoritieslinked
_data
GeoNames httpsgithubcomld4l-
labslinked_data_authoritiestreemasterqa_geonamesconfigauthoritieslink
ed_data
AGROVOC httpsgithubcomld4l-
labslinked_data_authoritiestreemasterqa_agrovocconfigauthoritieslinked
_data
NALT httpsgithubcomld4l-
labslinked_data_authoritiestreemasterqa_naltconfigauthoritieslinked_dat
a
DBpedia httpsgithubcomld4l-
labslinked_data_authoritiestreemasterqa_dbpediaconfigauthoritieslinked
_data
Appendix for Challenges 5-7
Creating a Cache Server
Hardware
bull 8-core 64gb 3Ghz Mac Pro (late 2013) macOS Sierra
(10126)
bull 32tb Pegasus-2 Thunderbolt RAID configured as RAID-5
Triplestore
bull Apache Jena Fuseki 240 provides SPARQL endpoint
bull Apache Tomcat 90 runs custom web application(s)
bull Apache Lucene 36 provides search interface
64
Customizations
bull custom per-data-source JSP web application provides
searchbrowsedownload functionality
bull custom (generic) SPARQL Tag Library provides API for web
apps (available at httpsgithubcomeichmannlod-utilities)
bull custom (generic) Lucene Tag Library provides API for web apps
65
Loading a New Vocabulary
bull download RDF
bull if necessary convert to n-triples (required for GeoNames data for instance)
bull use tdbloader2 to populated triplestore
bull configure Fuseki server(s) with triplestore details
bull create new JSP project in Eclipse
bull write one or more indexer programs that populate Lucene indices and run indexer(s)
bull write searchbrowsedownload application logic using the SPARQL and Lucene tags
bull package project as war
bull deploy to Apache Tomcat server(s)
bull add new service to Apache HTTPD virtual host specification
66
UI Access to Cache Server
httpservicesld4lorgld4l_servicesloc_namejsp
Downloads
68
LoC httpidlocgovdownload (n-triples OR rdf-xml)
OCLC FAST httpwwwoclcorgresearchthemesdata-sciencefastdownloadhtml (n-triples)
GeoNames httpwwwgeonamesorgontologydocumentationhtml (custom format ndash see notes for processing)
AGROVOC httpsaims-faoatlassiannetwikispacesAGVpages2949126Releases (n-triples OR rdf-xml)
NALT httpsagclassnalusdagovdownloadshtml (rdf-xml)
DBpedia httpwikidbpediaorgdownloads-2016-04
Potential Options for Reconciliation
bull VIAF for name reconciliation ndash we are doing some
work with this
bull Wikidata ndash Ive heard that they are working on
Reconciliation issues but havent yet explored in
depth bull Intro Video (3hrs)
bull API Access
bull SPARQL ndash user manual
bull federated queries with other authorities
Doing a google search for linked data reconciliation
returns a large number of articles and presentations
on this concept
Links to Code amp More
bull QA Server - Code for a small app that provides the
Questioning Authority normalization layer
bull Linked Data Authorities - Configurations that can
be used with QA Server
bull LD4L Services - UI access to Cache Server
bull VitroLib - Code for the VitroLib cataloging tool
Cache Server Query Process
JSP Query API
sparql query for URI
Jena-Fuseki
Triplestore
LuceneSOLR
Index
extract search rank
extract URI
for each result
lucene search for ezra cornell
index built with predicate values
ltskosprefLabelgt
ltskosaltLabelgt
httpservicesld4lorgld4l_servicesloc_name_batchjspquery=ezra20cornellampmaxRecords=10
One full setup per authority
Cache Server Query Process
JSP Query API
combine all results
Jena-Fuseki
Triplestore
insert search rank in predicate
lthttpvivoweborgontology
corerankgt
LuceneSOLR
Index
sparql query for URI extract search rank
extract URI
for each result
lucene search for ezra cornell
index built with predicate values
ltskosprefLabelgt
ltskosaltLabelgt
httpservicesld4lorgld4l_servicesloc_name_batchjspquery=ezra20cornellampmaxRecords=10
One full setup per authority
UI-QA-Authority
QA ndash normalize RDF returned from an authority
httplocalhost3000qasearchlinked_data
oclc_fastpersonal_nameq=twainampmaximumRecords=2
[urihttpidworldcatorgfast31622id31622
labelTwain Mark 1835-1910
urihttpidworldcatorgfast365563id365563
labelTwain Shania
httpexperimentalworldcatorgfastsearchquery=o
clcpersonalName+22twain22
ampsortKeys=usageampmaximumRecords=2
RDF of
search
results
Active-Triples
LDF Cache
(Marmotta or
Blazegraph) LDF Cache Jena-Fuseki-
Lucene
Cache
Direct Access
of External
Authority
HyraxVitrolib ndash UI for selecting an entry from an authority
search of cache performed via Lucene index
Third Set of Challenges
8 Disambiguation through better context eg expand from just prefLabel tohellip
preLabel altLabel birthdeath dates occupation etc
9 Reconciliation across multiple sources eg match LoC URI to OCLC FAST URI
53
Whatrsquos next
Addressing Architectural Challenges
bull Generalize process for accessing context on the
cache server and in the normalization layer
bull Multi-authority search and reconciliation
bull Address the need for cache refresh
bull Mirrored cache servers
User Experience and Design
bull User-centered Design
bull Observe listen learn design evaluate iterate
bull Iteratively design and evaluate UI for lookupauthorities
with catalogers
bull Search result rankingorderingfiltering for catalogers
bull Additional UI platforms eg FOLIO
56
Questions
httptinyurlcomld4l-auth-access
Appendix for Challenges 1-4
Challenge 1 Documentation
58
LoC httpidlocgovtechcenter
C Harlow notes on reconciling LoC - httpsgithubcomcmh2166lc-reconcile
OCLC FAST
httpswwwoclcorgdeveloperdevelopweb-servicesfast-apilinked-dataenhtml
GeoNames
httpwwwgeonamesorgexportgeonames-searchhtml
AGROVOC httpaimsfaoorgvest-registryvocabulariesagrovoc-multilingual-agricultural-thesaurus
swagger config httpsgithubcomNatLibFiSkosmosblobmasterswaggerjson
NALT
httpsagclassnalusdagov
DBpedia httpwikidbpediaorgOnlineAccess1220Public20Faceted20Web20Service20Inter
face
Challenge 2 Linked Data Access API
59
for Search Query for Term Fetch
LoC not supported URI
OCLC FAST httpexperimentalworldcatorgfastsearchq
uery=subauth+all+22query22ampsortK
eys=usageampmaximumRecords=maximumR
ecords
URI
GeoNames httpapigeonamesorgsearchq=queryamp
maxRows=maxRowsampusername=userna
meamptype=rdf
URI
AGROVOC httpartemideartuniroma2it8081agrovocr
estv1searchquery=queryamplang=lang
httpartemideartuniroma2it8081agrovo
crestv1datauri=httpaimsfaoorgaosa
grovocterm_id
NALT httpskosmoslibrarycornelledurestv1nalt
searchquery=queryamplang=lang
httpskosmoslibrarycornelledurestv1na
ltdatauri=term_uri
DBpedia
Challenge 3 Varying Results Formats
60
for Search Query for Term Fetch
LoC not supported rdf-xml
OCLC FAST rdf-xml rdf-xml
GeoNames rdf-xml rdf-xml
AGROVOC json-ld rdf-xml json-ld turtle
NALT json-ld rdf-xml json-ld turtle
DBpedia
Challenge 4 Varying Ontologies
61
Primary Ontology Flat vs Navigation
required
LoC madsrdf
SKOS
navigation required
OCLC FAST schemaorg
SKOS
flat
GeoNames geonames flat
hierarchical
AGROVOC SKOS flat
hierarchical
NALT SKOS flat
hierarchical
DBpedia dbpedia flat
Configurations for Questioning Authority
62
LoC httpsgithubcomld4l-
labslinked_data_authoritiestreemasterqa_locconfigauthoritieslinked_dat
a
OCLC FAST httpsgithubcomld4l-
labslinked_data_authoritiestreemasterqa_oclcfastconfigauthoritieslinked
_data
GeoNames httpsgithubcomld4l-
labslinked_data_authoritiestreemasterqa_geonamesconfigauthoritieslink
ed_data
AGROVOC httpsgithubcomld4l-
labslinked_data_authoritiestreemasterqa_agrovocconfigauthoritieslinked
_data
NALT httpsgithubcomld4l-
labslinked_data_authoritiestreemasterqa_naltconfigauthoritieslinked_dat
a
DBpedia httpsgithubcomld4l-
labslinked_data_authoritiestreemasterqa_dbpediaconfigauthoritieslinked
_data
Appendix for Challenges 5-7
Creating a Cache Server
Hardware
bull 8-core 64gb 3Ghz Mac Pro (late 2013) macOS Sierra
(10126)
bull 32tb Pegasus-2 Thunderbolt RAID configured as RAID-5
Triplestore
bull Apache Jena Fuseki 240 provides SPARQL endpoint
bull Apache Tomcat 90 runs custom web application(s)
bull Apache Lucene 36 provides search interface
64
Customizations
bull custom per-data-source JSP web application provides
searchbrowsedownload functionality
bull custom (generic) SPARQL Tag Library provides API for web
apps (available at httpsgithubcomeichmannlod-utilities)
bull custom (generic) Lucene Tag Library provides API for web apps
65
Loading a New Vocabulary
bull download RDF
bull if necessary convert to n-triples (required for GeoNames data for instance)
bull use tdbloader2 to populated triplestore
bull configure Fuseki server(s) with triplestore details
bull create new JSP project in Eclipse
bull write one or more indexer programs that populate Lucene indices and run indexer(s)
bull write searchbrowsedownload application logic using the SPARQL and Lucene tags
bull package project as war
bull deploy to Apache Tomcat server(s)
bull add new service to Apache HTTPD virtual host specification
66
UI Access to Cache Server
httpservicesld4lorgld4l_servicesloc_namejsp
Downloads
68
LoC httpidlocgovdownload (n-triples OR rdf-xml)
OCLC FAST httpwwwoclcorgresearchthemesdata-sciencefastdownloadhtml (n-triples)
GeoNames httpwwwgeonamesorgontologydocumentationhtml (custom format ndash see notes for processing)
AGROVOC httpsaims-faoatlassiannetwikispacesAGVpages2949126Releases (n-triples OR rdf-xml)
NALT httpsagclassnalusdagovdownloadshtml (rdf-xml)
DBpedia httpwikidbpediaorgdownloads-2016-04
Potential Options for Reconciliation
bull VIAF for name reconciliation ndash we are doing some
work with this
bull Wikidata ndash Ive heard that they are working on
Reconciliation issues but havent yet explored in
depth bull Intro Video (3hrs)
bull API Access
bull SPARQL ndash user manual
bull federated queries with other authorities
Doing a google search for linked data reconciliation
returns a large number of articles and presentations
on this concept
Links to Code amp More
bull QA Server - Code for a small app that provides the
Questioning Authority normalization layer
bull Linked Data Authorities - Configurations that can
be used with QA Server
bull LD4L Services - UI access to Cache Server
bull VitroLib - Code for the VitroLib cataloging tool
Cache Server Query Process
JSP Query API
combine all results
Jena-Fuseki
Triplestore
insert search rank in predicate
lthttpvivoweborgontology
corerankgt
LuceneSOLR
Index
sparql query for URI extract search rank
extract URI
for each result
lucene search for ezra cornell
index built with predicate values
ltskosprefLabelgt
ltskosaltLabelgt
httpservicesld4lorgld4l_servicesloc_name_batchjspquery=ezra20cornellampmaxRecords=10
One full setup per authority
UI-QA-Authority
QA ndash normalize RDF returned from an authority
httplocalhost3000qasearchlinked_data
oclc_fastpersonal_nameq=twainampmaximumRecords=2
[urihttpidworldcatorgfast31622id31622
labelTwain Mark 1835-1910
urihttpidworldcatorgfast365563id365563
labelTwain Shania
httpexperimentalworldcatorgfastsearchquery=o
clcpersonalName+22twain22
ampsortKeys=usageampmaximumRecords=2
RDF of
search
results
Active-Triples
LDF Cache
(Marmotta or
Blazegraph) LDF Cache Jena-Fuseki-
Lucene
Cache
Direct Access
of External
Authority
HyraxVitrolib ndash UI for selecting an entry from an authority
search of cache performed via Lucene index
Third Set of Challenges
8 Disambiguation through better context eg expand from just prefLabel tohellip
preLabel altLabel birthdeath dates occupation etc
9 Reconciliation across multiple sources eg match LoC URI to OCLC FAST URI
53
Whatrsquos next
Addressing Architectural Challenges
bull Generalize process for accessing context on the
cache server and in the normalization layer
bull Multi-authority search and reconciliation
bull Address the need for cache refresh
bull Mirrored cache servers
User Experience and Design
bull User-centered Design
bull Observe listen learn design evaluate iterate
bull Iteratively design and evaluate UI for lookupauthorities
with catalogers
bull Search result rankingorderingfiltering for catalogers
bull Additional UI platforms eg FOLIO
56
Questions
httptinyurlcomld4l-auth-access
Appendix for Challenges 1-4
Challenge 1 Documentation
58
LoC httpidlocgovtechcenter
C Harlow notes on reconciling LoC - httpsgithubcomcmh2166lc-reconcile
OCLC FAST
httpswwwoclcorgdeveloperdevelopweb-servicesfast-apilinked-dataenhtml
GeoNames
httpwwwgeonamesorgexportgeonames-searchhtml
AGROVOC httpaimsfaoorgvest-registryvocabulariesagrovoc-multilingual-agricultural-thesaurus
swagger config httpsgithubcomNatLibFiSkosmosblobmasterswaggerjson
NALT
httpsagclassnalusdagov
DBpedia httpwikidbpediaorgOnlineAccess1220Public20Faceted20Web20Service20Inter
face
Challenge 2 Linked Data Access API
59
for Search Query for Term Fetch
LoC not supported URI
OCLC FAST httpexperimentalworldcatorgfastsearchq
uery=subauth+all+22query22ampsortK
eys=usageampmaximumRecords=maximumR
ecords
URI
GeoNames httpapigeonamesorgsearchq=queryamp
maxRows=maxRowsampusername=userna
meamptype=rdf
URI
AGROVOC httpartemideartuniroma2it8081agrovocr
estv1searchquery=queryamplang=lang
httpartemideartuniroma2it8081agrovo
crestv1datauri=httpaimsfaoorgaosa
grovocterm_id
NALT httpskosmoslibrarycornelledurestv1nalt
searchquery=queryamplang=lang
httpskosmoslibrarycornelledurestv1na
ltdatauri=term_uri
DBpedia
Challenge 3 Varying Results Formats
60
for Search Query for Term Fetch
LoC not supported rdf-xml
OCLC FAST rdf-xml rdf-xml
GeoNames rdf-xml rdf-xml
AGROVOC json-ld rdf-xml json-ld turtle
NALT json-ld rdf-xml json-ld turtle
DBpedia
Challenge 4 Varying Ontologies
61
Primary Ontology Flat vs Navigation
required
LoC madsrdf
SKOS
navigation required
OCLC FAST schemaorg
SKOS
flat
GeoNames geonames flat
hierarchical
AGROVOC SKOS flat
hierarchical
NALT SKOS flat
hierarchical
DBpedia dbpedia flat
Configurations for Questioning Authority
62
LoC httpsgithubcomld4l-
labslinked_data_authoritiestreemasterqa_locconfigauthoritieslinked_dat
a
OCLC FAST httpsgithubcomld4l-
labslinked_data_authoritiestreemasterqa_oclcfastconfigauthoritieslinked
_data
GeoNames httpsgithubcomld4l-
labslinked_data_authoritiestreemasterqa_geonamesconfigauthoritieslink
ed_data
AGROVOC httpsgithubcomld4l-
labslinked_data_authoritiestreemasterqa_agrovocconfigauthoritieslinked
_data
NALT httpsgithubcomld4l-
labslinked_data_authoritiestreemasterqa_naltconfigauthoritieslinked_dat
a
DBpedia httpsgithubcomld4l-
labslinked_data_authoritiestreemasterqa_dbpediaconfigauthoritieslinked
_data
Appendix for Challenges 5-7
Creating a Cache Server
Hardware
bull 8-core 64gb 3Ghz Mac Pro (late 2013) macOS Sierra
(10126)
bull 32tb Pegasus-2 Thunderbolt RAID configured as RAID-5
Triplestore
bull Apache Jena Fuseki 240 provides SPARQL endpoint
bull Apache Tomcat 90 runs custom web application(s)
bull Apache Lucene 36 provides search interface
64
Customizations
bull custom per-data-source JSP web application provides
searchbrowsedownload functionality
bull custom (generic) SPARQL Tag Library provides API for web
apps (available at httpsgithubcomeichmannlod-utilities)
bull custom (generic) Lucene Tag Library provides API for web apps
65
Loading a New Vocabulary
bull download RDF
bull if necessary convert to n-triples (required for GeoNames data for instance)
bull use tdbloader2 to populated triplestore
bull configure Fuseki server(s) with triplestore details
bull create new JSP project in Eclipse
bull write one or more indexer programs that populate Lucene indices and run indexer(s)
bull write searchbrowsedownload application logic using the SPARQL and Lucene tags
bull package project as war
bull deploy to Apache Tomcat server(s)
bull add new service to Apache HTTPD virtual host specification
66
UI Access to Cache Server
httpservicesld4lorgld4l_servicesloc_namejsp
Downloads
68
LoC httpidlocgovdownload (n-triples OR rdf-xml)
OCLC FAST httpwwwoclcorgresearchthemesdata-sciencefastdownloadhtml (n-triples)
GeoNames httpwwwgeonamesorgontologydocumentationhtml (custom format ndash see notes for processing)
AGROVOC httpsaims-faoatlassiannetwikispacesAGVpages2949126Releases (n-triples OR rdf-xml)
NALT httpsagclassnalusdagovdownloadshtml (rdf-xml)
DBpedia httpwikidbpediaorgdownloads-2016-04
Potential Options for Reconciliation
bull VIAF for name reconciliation ndash we are doing some
work with this
bull Wikidata ndash Ive heard that they are working on
Reconciliation issues but havent yet explored in
depth bull Intro Video (3hrs)
bull API Access
bull SPARQL ndash user manual
bull federated queries with other authorities
Doing a google search for linked data reconciliation
returns a large number of articles and presentations
on this concept
Links to Code amp More
bull QA Server - Code for a small app that provides the
Questioning Authority normalization layer
bull Linked Data Authorities - Configurations that can
be used with QA Server
bull LD4L Services - UI access to Cache Server
bull VitroLib - Code for the VitroLib cataloging tool
UI-QA-Authority
QA ndash normalize RDF returned from an authority
httplocalhost3000qasearchlinked_data
oclc_fastpersonal_nameq=twainampmaximumRecords=2
[urihttpidworldcatorgfast31622id31622
labelTwain Mark 1835-1910
urihttpidworldcatorgfast365563id365563
labelTwain Shania
httpexperimentalworldcatorgfastsearchquery=o
clcpersonalName+22twain22
ampsortKeys=usageampmaximumRecords=2
RDF of
search
results
Active-Triples
LDF Cache
(Marmotta or
Blazegraph) LDF Cache Jena-Fuseki-
Lucene
Cache
Direct Access
of External
Authority
HyraxVitrolib ndash UI for selecting an entry from an authority
search of cache performed via Lucene index
Third Set of Challenges
8 Disambiguation through better context eg expand from just prefLabel tohellip
preLabel altLabel birthdeath dates occupation etc
9 Reconciliation across multiple sources eg match LoC URI to OCLC FAST URI
53
Whatrsquos next
Addressing Architectural Challenges
bull Generalize process for accessing context on the
cache server and in the normalization layer
bull Multi-authority search and reconciliation
bull Address the need for cache refresh
bull Mirrored cache servers
User Experience and Design
bull User-centered Design
bull Observe listen learn design evaluate iterate
bull Iteratively design and evaluate UI for lookupauthorities
with catalogers
bull Search result rankingorderingfiltering for catalogers
bull Additional UI platforms eg FOLIO
56
Questions
httptinyurlcomld4l-auth-access
Appendix for Challenges 1-4
Challenge 1 Documentation
58
LoC httpidlocgovtechcenter
C Harlow notes on reconciling LoC - httpsgithubcomcmh2166lc-reconcile
OCLC FAST
httpswwwoclcorgdeveloperdevelopweb-servicesfast-apilinked-dataenhtml
GeoNames
httpwwwgeonamesorgexportgeonames-searchhtml
AGROVOC httpaimsfaoorgvest-registryvocabulariesagrovoc-multilingual-agricultural-thesaurus
swagger config httpsgithubcomNatLibFiSkosmosblobmasterswaggerjson
NALT
httpsagclassnalusdagov
DBpedia httpwikidbpediaorgOnlineAccess1220Public20Faceted20Web20Service20Inter
face
Challenge 2 Linked Data Access API
59
for Search Query for Term Fetch
LoC not supported URI
OCLC FAST httpexperimentalworldcatorgfastsearchq
uery=subauth+all+22query22ampsortK
eys=usageampmaximumRecords=maximumR
ecords
URI
GeoNames httpapigeonamesorgsearchq=queryamp
maxRows=maxRowsampusername=userna
meamptype=rdf
URI
AGROVOC httpartemideartuniroma2it8081agrovocr
estv1searchquery=queryamplang=lang
httpartemideartuniroma2it8081agrovo
crestv1datauri=httpaimsfaoorgaosa
grovocterm_id
NALT httpskosmoslibrarycornelledurestv1nalt
searchquery=queryamplang=lang
httpskosmoslibrarycornelledurestv1na
ltdatauri=term_uri
DBpedia
Challenge 3 Varying Results Formats
60
for Search Query for Term Fetch
LoC not supported rdf-xml
OCLC FAST rdf-xml rdf-xml
GeoNames rdf-xml rdf-xml
AGROVOC json-ld rdf-xml json-ld turtle
NALT json-ld rdf-xml json-ld turtle
DBpedia
Challenge 4 Varying Ontologies
61
Primary Ontology Flat vs Navigation
required
LoC madsrdf
SKOS
navigation required
OCLC FAST schemaorg
SKOS
flat
GeoNames geonames flat
hierarchical
AGROVOC SKOS flat
hierarchical
NALT SKOS flat
hierarchical
DBpedia dbpedia flat
Configurations for Questioning Authority
62
LoC httpsgithubcomld4l-
labslinked_data_authoritiestreemasterqa_locconfigauthoritieslinked_dat
a
OCLC FAST httpsgithubcomld4l-
labslinked_data_authoritiestreemasterqa_oclcfastconfigauthoritieslinked
_data
GeoNames httpsgithubcomld4l-
labslinked_data_authoritiestreemasterqa_geonamesconfigauthoritieslink
ed_data
AGROVOC httpsgithubcomld4l-
labslinked_data_authoritiestreemasterqa_agrovocconfigauthoritieslinked
_data
NALT httpsgithubcomld4l-
labslinked_data_authoritiestreemasterqa_naltconfigauthoritieslinked_dat
a
DBpedia httpsgithubcomld4l-
labslinked_data_authoritiestreemasterqa_dbpediaconfigauthoritieslinked
_data
Appendix for Challenges 5-7
Creating a Cache Server
Hardware
bull 8-core 64gb 3Ghz Mac Pro (late 2013) macOS Sierra
(10126)
bull 32tb Pegasus-2 Thunderbolt RAID configured as RAID-5
Triplestore
bull Apache Jena Fuseki 240 provides SPARQL endpoint
bull Apache Tomcat 90 runs custom web application(s)
bull Apache Lucene 36 provides search interface
64
Customizations
bull custom per-data-source JSP web application provides
searchbrowsedownload functionality
bull custom (generic) SPARQL Tag Library provides API for web
apps (available at httpsgithubcomeichmannlod-utilities)
bull custom (generic) Lucene Tag Library provides API for web apps
65
Loading a New Vocabulary
bull download RDF
bull if necessary convert to n-triples (required for GeoNames data for instance)
bull use tdbloader2 to populated triplestore
bull configure Fuseki server(s) with triplestore details
bull create new JSP project in Eclipse
bull write one or more indexer programs that populate Lucene indices and run indexer(s)
bull write searchbrowsedownload application logic using the SPARQL and Lucene tags
bull package project as war
bull deploy to Apache Tomcat server(s)
bull add new service to Apache HTTPD virtual host specification
66
UI Access to Cache Server
httpservicesld4lorgld4l_servicesloc_namejsp
Downloads
68
LoC httpidlocgovdownload (n-triples OR rdf-xml)
OCLC FAST httpwwwoclcorgresearchthemesdata-sciencefastdownloadhtml (n-triples)
GeoNames httpwwwgeonamesorgontologydocumentationhtml (custom format ndash see notes for processing)
AGROVOC httpsaims-faoatlassiannetwikispacesAGVpages2949126Releases (n-triples OR rdf-xml)
NALT httpsagclassnalusdagovdownloadshtml (rdf-xml)
DBpedia httpwikidbpediaorgdownloads-2016-04
Potential Options for Reconciliation
bull VIAF for name reconciliation ndash we are doing some
work with this
bull Wikidata ndash Ive heard that they are working on
Reconciliation issues but havent yet explored in
depth bull Intro Video (3hrs)
bull API Access
bull SPARQL ndash user manual
bull federated queries with other authorities
Doing a google search for linked data reconciliation
returns a large number of articles and presentations
on this concept
Links to Code amp More
bull QA Server - Code for a small app that provides the
Questioning Authority normalization layer
bull Linked Data Authorities - Configurations that can
be used with QA Server
bull LD4L Services - UI access to Cache Server
bull VitroLib - Code for the VitroLib cataloging tool
Third Set of Challenges
8 Disambiguation through better context eg expand from just prefLabel tohellip
preLabel altLabel birthdeath dates occupation etc
9 Reconciliation across multiple sources eg match LoC URI to OCLC FAST URI
53
Whatrsquos next
Addressing Architectural Challenges
bull Generalize process for accessing context on the
cache server and in the normalization layer
bull Multi-authority search and reconciliation
bull Address the need for cache refresh
bull Mirrored cache servers
User Experience and Design
bull User-centered Design
bull Observe listen learn design evaluate iterate
bull Iteratively design and evaluate UI for lookupauthorities
with catalogers
bull Search result rankingorderingfiltering for catalogers
bull Additional UI platforms eg FOLIO
56
Questions
httptinyurlcomld4l-auth-access
Appendix for Challenges 1-4
Challenge 1 Documentation
58
LoC httpidlocgovtechcenter
C Harlow notes on reconciling LoC - httpsgithubcomcmh2166lc-reconcile
OCLC FAST
httpswwwoclcorgdeveloperdevelopweb-servicesfast-apilinked-dataenhtml
GeoNames
httpwwwgeonamesorgexportgeonames-searchhtml
AGROVOC httpaimsfaoorgvest-registryvocabulariesagrovoc-multilingual-agricultural-thesaurus
swagger config httpsgithubcomNatLibFiSkosmosblobmasterswaggerjson
NALT
httpsagclassnalusdagov
DBpedia httpwikidbpediaorgOnlineAccess1220Public20Faceted20Web20Service20Inter
face
Challenge 2 Linked Data Access API
59
for Search Query for Term Fetch
LoC not supported URI
OCLC FAST httpexperimentalworldcatorgfastsearchq
uery=subauth+all+22query22ampsortK
eys=usageampmaximumRecords=maximumR
ecords
URI
GeoNames httpapigeonamesorgsearchq=queryamp
maxRows=maxRowsampusername=userna
meamptype=rdf
URI
AGROVOC httpartemideartuniroma2it8081agrovocr
estv1searchquery=queryamplang=lang
httpartemideartuniroma2it8081agrovo
crestv1datauri=httpaimsfaoorgaosa
grovocterm_id
NALT httpskosmoslibrarycornelledurestv1nalt
searchquery=queryamplang=lang
httpskosmoslibrarycornelledurestv1na
ltdatauri=term_uri
DBpedia
Challenge 3 Varying Results Formats
60
for Search Query for Term Fetch
LoC not supported rdf-xml
OCLC FAST rdf-xml rdf-xml
GeoNames rdf-xml rdf-xml
AGROVOC json-ld rdf-xml json-ld turtle
NALT json-ld rdf-xml json-ld turtle
DBpedia
Challenge 4 Varying Ontologies
61
Primary Ontology Flat vs Navigation
required
LoC madsrdf
SKOS
navigation required
OCLC FAST schemaorg
SKOS
flat
GeoNames geonames flat
hierarchical
AGROVOC SKOS flat
hierarchical
NALT SKOS flat
hierarchical
DBpedia dbpedia flat
Configurations for Questioning Authority
62
LoC httpsgithubcomld4l-
labslinked_data_authoritiestreemasterqa_locconfigauthoritieslinked_dat
a
OCLC FAST httpsgithubcomld4l-
labslinked_data_authoritiestreemasterqa_oclcfastconfigauthoritieslinked
_data
GeoNames httpsgithubcomld4l-
labslinked_data_authoritiestreemasterqa_geonamesconfigauthoritieslink
ed_data
AGROVOC httpsgithubcomld4l-
labslinked_data_authoritiestreemasterqa_agrovocconfigauthoritieslinked
_data
NALT httpsgithubcomld4l-
labslinked_data_authoritiestreemasterqa_naltconfigauthoritieslinked_dat
a
DBpedia httpsgithubcomld4l-
labslinked_data_authoritiestreemasterqa_dbpediaconfigauthoritieslinked
_data
Appendix for Challenges 5-7
Creating a Cache Server
Hardware
bull 8-core 64gb 3Ghz Mac Pro (late 2013) macOS Sierra
(10126)
bull 32tb Pegasus-2 Thunderbolt RAID configured as RAID-5
Triplestore
bull Apache Jena Fuseki 240 provides SPARQL endpoint
bull Apache Tomcat 90 runs custom web application(s)
bull Apache Lucene 36 provides search interface
64
Customizations
bull custom per-data-source JSP web application provides
searchbrowsedownload functionality
bull custom (generic) SPARQL Tag Library provides API for web
apps (available at httpsgithubcomeichmannlod-utilities)
bull custom (generic) Lucene Tag Library provides API for web apps
65
Loading a New Vocabulary
bull download RDF
bull if necessary convert to n-triples (required for GeoNames data for instance)
bull use tdbloader2 to populated triplestore
bull configure Fuseki server(s) with triplestore details
bull create new JSP project in Eclipse
bull write one or more indexer programs that populate Lucene indices and run indexer(s)
bull write searchbrowsedownload application logic using the SPARQL and Lucene tags
bull package project as war
bull deploy to Apache Tomcat server(s)
bull add new service to Apache HTTPD virtual host specification
66
UI Access to Cache Server
httpservicesld4lorgld4l_servicesloc_namejsp
Downloads
68
LoC httpidlocgovdownload (n-triples OR rdf-xml)
OCLC FAST httpwwwoclcorgresearchthemesdata-sciencefastdownloadhtml (n-triples)
GeoNames httpwwwgeonamesorgontologydocumentationhtml (custom format ndash see notes for processing)
AGROVOC httpsaims-faoatlassiannetwikispacesAGVpages2949126Releases (n-triples OR rdf-xml)
NALT httpsagclassnalusdagovdownloadshtml (rdf-xml)
DBpedia httpwikidbpediaorgdownloads-2016-04
Potential Options for Reconciliation
bull VIAF for name reconciliation ndash we are doing some
work with this
bull Wikidata ndash Ive heard that they are working on
Reconciliation issues but havent yet explored in
depth bull Intro Video (3hrs)
bull API Access
bull SPARQL ndash user manual
bull federated queries with other authorities
Doing a google search for linked data reconciliation
returns a large number of articles and presentations
on this concept
Links to Code amp More
bull QA Server - Code for a small app that provides the
Questioning Authority normalization layer
bull Linked Data Authorities - Configurations that can
be used with QA Server
bull LD4L Services - UI access to Cache Server
bull VitroLib - Code for the VitroLib cataloging tool
53
Whatrsquos next
Addressing Architectural Challenges
bull Generalize process for accessing context on the
cache server and in the normalization layer
bull Multi-authority search and reconciliation
bull Address the need for cache refresh
bull Mirrored cache servers
User Experience and Design
bull User-centered Design
bull Observe listen learn design evaluate iterate
bull Iteratively design and evaluate UI for lookupauthorities
with catalogers
bull Search result rankingorderingfiltering for catalogers
bull Additional UI platforms eg FOLIO
56
Questions
httptinyurlcomld4l-auth-access
Appendix for Challenges 1-4
Challenge 1 Documentation
58
LoC httpidlocgovtechcenter
C Harlow notes on reconciling LoC - httpsgithubcomcmh2166lc-reconcile
OCLC FAST
httpswwwoclcorgdeveloperdevelopweb-servicesfast-apilinked-dataenhtml
GeoNames
httpwwwgeonamesorgexportgeonames-searchhtml
AGROVOC httpaimsfaoorgvest-registryvocabulariesagrovoc-multilingual-agricultural-thesaurus
swagger config httpsgithubcomNatLibFiSkosmosblobmasterswaggerjson
NALT
httpsagclassnalusdagov
DBpedia httpwikidbpediaorgOnlineAccess1220Public20Faceted20Web20Service20Inter
face
Challenge 2 Linked Data Access API
59
for Search Query for Term Fetch
LoC not supported URI
OCLC FAST httpexperimentalworldcatorgfastsearchq
uery=subauth+all+22query22ampsortK
eys=usageampmaximumRecords=maximumR
ecords
URI
GeoNames httpapigeonamesorgsearchq=queryamp
maxRows=maxRowsampusername=userna
meamptype=rdf
URI
AGROVOC httpartemideartuniroma2it8081agrovocr
estv1searchquery=queryamplang=lang
httpartemideartuniroma2it8081agrovo
crestv1datauri=httpaimsfaoorgaosa
grovocterm_id
NALT httpskosmoslibrarycornelledurestv1nalt
searchquery=queryamplang=lang
httpskosmoslibrarycornelledurestv1na
ltdatauri=term_uri
DBpedia
Challenge 3 Varying Results Formats
60
for Search Query for Term Fetch
LoC not supported rdf-xml
OCLC FAST rdf-xml rdf-xml
GeoNames rdf-xml rdf-xml
AGROVOC json-ld rdf-xml json-ld turtle
NALT json-ld rdf-xml json-ld turtle
DBpedia
Challenge 4 Varying Ontologies
61
Primary Ontology Flat vs Navigation
required
LoC madsrdf
SKOS
navigation required
OCLC FAST schemaorg
SKOS
flat
GeoNames geonames flat
hierarchical
AGROVOC SKOS flat
hierarchical
NALT SKOS flat
hierarchical
DBpedia dbpedia flat
Configurations for Questioning Authority
62
LoC httpsgithubcomld4l-
labslinked_data_authoritiestreemasterqa_locconfigauthoritieslinked_dat
a
OCLC FAST httpsgithubcomld4l-
labslinked_data_authoritiestreemasterqa_oclcfastconfigauthoritieslinked
_data
GeoNames httpsgithubcomld4l-
labslinked_data_authoritiestreemasterqa_geonamesconfigauthoritieslink
ed_data
AGROVOC httpsgithubcomld4l-
labslinked_data_authoritiestreemasterqa_agrovocconfigauthoritieslinked
_data
NALT httpsgithubcomld4l-
labslinked_data_authoritiestreemasterqa_naltconfigauthoritieslinked_dat
a
DBpedia httpsgithubcomld4l-
labslinked_data_authoritiestreemasterqa_dbpediaconfigauthoritieslinked
_data
Appendix for Challenges 5-7
Creating a Cache Server
Hardware
bull 8-core 64gb 3Ghz Mac Pro (late 2013) macOS Sierra
(10126)
bull 32tb Pegasus-2 Thunderbolt RAID configured as RAID-5
Triplestore
bull Apache Jena Fuseki 240 provides SPARQL endpoint
bull Apache Tomcat 90 runs custom web application(s)
bull Apache Lucene 36 provides search interface
64
Customizations
bull custom per-data-source JSP web application provides
searchbrowsedownload functionality
bull custom (generic) SPARQL Tag Library provides API for web
apps (available at httpsgithubcomeichmannlod-utilities)
bull custom (generic) Lucene Tag Library provides API for web apps
65
Loading a New Vocabulary
bull download RDF
bull if necessary convert to n-triples (required for GeoNames data for instance)
bull use tdbloader2 to populated triplestore
bull configure Fuseki server(s) with triplestore details
bull create new JSP project in Eclipse
bull write one or more indexer programs that populate Lucene indices and run indexer(s)
bull write searchbrowsedownload application logic using the SPARQL and Lucene tags
bull package project as war
bull deploy to Apache Tomcat server(s)
bull add new service to Apache HTTPD virtual host specification
66
UI Access to Cache Server
httpservicesld4lorgld4l_servicesloc_namejsp
Downloads
68
LoC httpidlocgovdownload (n-triples OR rdf-xml)
OCLC FAST httpwwwoclcorgresearchthemesdata-sciencefastdownloadhtml (n-triples)
GeoNames httpwwwgeonamesorgontologydocumentationhtml (custom format ndash see notes for processing)
AGROVOC httpsaims-faoatlassiannetwikispacesAGVpages2949126Releases (n-triples OR rdf-xml)
NALT httpsagclassnalusdagovdownloadshtml (rdf-xml)
DBpedia httpwikidbpediaorgdownloads-2016-04
Potential Options for Reconciliation
bull VIAF for name reconciliation ndash we are doing some
work with this
bull Wikidata ndash Ive heard that they are working on
Reconciliation issues but havent yet explored in
depth bull Intro Video (3hrs)
bull API Access
bull SPARQL ndash user manual
bull federated queries with other authorities
Doing a google search for linked data reconciliation
returns a large number of articles and presentations
on this concept
Links to Code amp More
bull QA Server - Code for a small app that provides the
Questioning Authority normalization layer
bull Linked Data Authorities - Configurations that can
be used with QA Server
bull LD4L Services - UI access to Cache Server
bull VitroLib - Code for the VitroLib cataloging tool
Addressing Architectural Challenges
bull Generalize process for accessing context on the
cache server and in the normalization layer
bull Multi-authority search and reconciliation
bull Address the need for cache refresh
bull Mirrored cache servers
User Experience and Design
bull User-centered Design
bull Observe listen learn design evaluate iterate
bull Iteratively design and evaluate UI for lookupauthorities
with catalogers
bull Search result rankingorderingfiltering for catalogers
bull Additional UI platforms eg FOLIO
56
Questions
httptinyurlcomld4l-auth-access
Appendix for Challenges 1-4
Challenge 1 Documentation
58
LoC httpidlocgovtechcenter
C Harlow notes on reconciling LoC - httpsgithubcomcmh2166lc-reconcile
OCLC FAST
httpswwwoclcorgdeveloperdevelopweb-servicesfast-apilinked-dataenhtml
GeoNames
httpwwwgeonamesorgexportgeonames-searchhtml
AGROVOC httpaimsfaoorgvest-registryvocabulariesagrovoc-multilingual-agricultural-thesaurus
swagger config httpsgithubcomNatLibFiSkosmosblobmasterswaggerjson
NALT
httpsagclassnalusdagov
DBpedia httpwikidbpediaorgOnlineAccess1220Public20Faceted20Web20Service20Inter
face
Challenge 2 Linked Data Access API
59
for Search Query for Term Fetch
LoC not supported URI
OCLC FAST httpexperimentalworldcatorgfastsearchq
uery=subauth+all+22query22ampsortK
eys=usageampmaximumRecords=maximumR
ecords
URI
GeoNames httpapigeonamesorgsearchq=queryamp
maxRows=maxRowsampusername=userna
meamptype=rdf
URI
AGROVOC httpartemideartuniroma2it8081agrovocr
estv1searchquery=queryamplang=lang
httpartemideartuniroma2it8081agrovo
crestv1datauri=httpaimsfaoorgaosa
grovocterm_id
NALT httpskosmoslibrarycornelledurestv1nalt
searchquery=queryamplang=lang
httpskosmoslibrarycornelledurestv1na
ltdatauri=term_uri
DBpedia
Challenge 3 Varying Results Formats
60
for Search Query for Term Fetch
LoC not supported rdf-xml
OCLC FAST rdf-xml rdf-xml
GeoNames rdf-xml rdf-xml
AGROVOC json-ld rdf-xml json-ld turtle
NALT json-ld rdf-xml json-ld turtle
DBpedia
Challenge 4 Varying Ontologies
61
Primary Ontology Flat vs Navigation
required
LoC madsrdf
SKOS
navigation required
OCLC FAST schemaorg
SKOS
flat
GeoNames geonames flat
hierarchical
AGROVOC SKOS flat
hierarchical
NALT SKOS flat
hierarchical
DBpedia dbpedia flat
Configurations for Questioning Authority
62
LoC httpsgithubcomld4l-
labslinked_data_authoritiestreemasterqa_locconfigauthoritieslinked_dat
a
OCLC FAST httpsgithubcomld4l-
labslinked_data_authoritiestreemasterqa_oclcfastconfigauthoritieslinked
_data
GeoNames httpsgithubcomld4l-
labslinked_data_authoritiestreemasterqa_geonamesconfigauthoritieslink
ed_data
AGROVOC httpsgithubcomld4l-
labslinked_data_authoritiestreemasterqa_agrovocconfigauthoritieslinked
_data
NALT httpsgithubcomld4l-
labslinked_data_authoritiestreemasterqa_naltconfigauthoritieslinked_dat
a
DBpedia httpsgithubcomld4l-
labslinked_data_authoritiestreemasterqa_dbpediaconfigauthoritieslinked
_data
Appendix for Challenges 5-7
Creating a Cache Server
Hardware
bull 8-core 64gb 3Ghz Mac Pro (late 2013) macOS Sierra
(10126)
bull 32tb Pegasus-2 Thunderbolt RAID configured as RAID-5
Triplestore
bull Apache Jena Fuseki 240 provides SPARQL endpoint
bull Apache Tomcat 90 runs custom web application(s)
bull Apache Lucene 36 provides search interface
64
Customizations
bull custom per-data-source JSP web application provides
searchbrowsedownload functionality
bull custom (generic) SPARQL Tag Library provides API for web
apps (available at httpsgithubcomeichmannlod-utilities)
bull custom (generic) Lucene Tag Library provides API for web apps
65
Loading a New Vocabulary
bull download RDF
bull if necessary convert to n-triples (required for GeoNames data for instance)
bull use tdbloader2 to populated triplestore
bull configure Fuseki server(s) with triplestore details
bull create new JSP project in Eclipse
bull write one or more indexer programs that populate Lucene indices and run indexer(s)
bull write searchbrowsedownload application logic using the SPARQL and Lucene tags
bull package project as war
bull deploy to Apache Tomcat server(s)
bull add new service to Apache HTTPD virtual host specification
66
UI Access to Cache Server
httpservicesld4lorgld4l_servicesloc_namejsp
Downloads
68
LoC httpidlocgovdownload (n-triples OR rdf-xml)
OCLC FAST httpwwwoclcorgresearchthemesdata-sciencefastdownloadhtml (n-triples)
GeoNames httpwwwgeonamesorgontologydocumentationhtml (custom format ndash see notes for processing)
AGROVOC httpsaims-faoatlassiannetwikispacesAGVpages2949126Releases (n-triples OR rdf-xml)
NALT httpsagclassnalusdagovdownloadshtml (rdf-xml)
DBpedia httpwikidbpediaorgdownloads-2016-04
Potential Options for Reconciliation
bull VIAF for name reconciliation ndash we are doing some
work with this
bull Wikidata ndash Ive heard that they are working on
Reconciliation issues but havent yet explored in
depth bull Intro Video (3hrs)
bull API Access
bull SPARQL ndash user manual
bull federated queries with other authorities
Doing a google search for linked data reconciliation
returns a large number of articles and presentations
on this concept
Links to Code amp More
bull QA Server - Code for a small app that provides the
Questioning Authority normalization layer
bull Linked Data Authorities - Configurations that can
be used with QA Server
bull LD4L Services - UI access to Cache Server
bull VitroLib - Code for the VitroLib cataloging tool
User Experience and Design
bull User-centered Design
bull Observe listen learn design evaluate iterate
bull Iteratively design and evaluate UI for lookupauthorities
with catalogers
bull Search result rankingorderingfiltering for catalogers
bull Additional UI platforms eg FOLIO
56
Questions
httptinyurlcomld4l-auth-access
Appendix for Challenges 1-4
Challenge 1 Documentation
58
LoC httpidlocgovtechcenter
C Harlow notes on reconciling LoC - httpsgithubcomcmh2166lc-reconcile
OCLC FAST
httpswwwoclcorgdeveloperdevelopweb-servicesfast-apilinked-dataenhtml
GeoNames
httpwwwgeonamesorgexportgeonames-searchhtml
AGROVOC httpaimsfaoorgvest-registryvocabulariesagrovoc-multilingual-agricultural-thesaurus
swagger config httpsgithubcomNatLibFiSkosmosblobmasterswaggerjson
NALT
httpsagclassnalusdagov
DBpedia httpwikidbpediaorgOnlineAccess1220Public20Faceted20Web20Service20Inter
face
Challenge 2 Linked Data Access API
59
for Search Query for Term Fetch
LoC not supported URI
OCLC FAST httpexperimentalworldcatorgfastsearchq
uery=subauth+all+22query22ampsortK
eys=usageampmaximumRecords=maximumR
ecords
URI
GeoNames httpapigeonamesorgsearchq=queryamp
maxRows=maxRowsampusername=userna
meamptype=rdf
URI
AGROVOC httpartemideartuniroma2it8081agrovocr
estv1searchquery=queryamplang=lang
httpartemideartuniroma2it8081agrovo
crestv1datauri=httpaimsfaoorgaosa
grovocterm_id
NALT httpskosmoslibrarycornelledurestv1nalt
searchquery=queryamplang=lang
httpskosmoslibrarycornelledurestv1na
ltdatauri=term_uri
DBpedia
Challenge 3 Varying Results Formats
60
for Search Query for Term Fetch
LoC not supported rdf-xml
OCLC FAST rdf-xml rdf-xml
GeoNames rdf-xml rdf-xml
AGROVOC json-ld rdf-xml json-ld turtle
NALT json-ld rdf-xml json-ld turtle
DBpedia
Challenge 4 Varying Ontologies
61
Primary Ontology Flat vs Navigation
required
LoC madsrdf
SKOS
navigation required
OCLC FAST schemaorg
SKOS
flat
GeoNames geonames flat
hierarchical
AGROVOC SKOS flat
hierarchical
NALT SKOS flat
hierarchical
DBpedia dbpedia flat
Configurations for Questioning Authority
62
LoC httpsgithubcomld4l-
labslinked_data_authoritiestreemasterqa_locconfigauthoritieslinked_dat
a
OCLC FAST httpsgithubcomld4l-
labslinked_data_authoritiestreemasterqa_oclcfastconfigauthoritieslinked
_data
GeoNames httpsgithubcomld4l-
labslinked_data_authoritiestreemasterqa_geonamesconfigauthoritieslink
ed_data
AGROVOC httpsgithubcomld4l-
labslinked_data_authoritiestreemasterqa_agrovocconfigauthoritieslinked
_data
NALT httpsgithubcomld4l-
labslinked_data_authoritiestreemasterqa_naltconfigauthoritieslinked_dat
a
DBpedia httpsgithubcomld4l-
labslinked_data_authoritiestreemasterqa_dbpediaconfigauthoritieslinked
_data
Appendix for Challenges 5-7
Creating a Cache Server
Hardware
bull 8-core 64gb 3Ghz Mac Pro (late 2013) macOS Sierra
(10126)
bull 32tb Pegasus-2 Thunderbolt RAID configured as RAID-5
Triplestore
bull Apache Jena Fuseki 240 provides SPARQL endpoint
bull Apache Tomcat 90 runs custom web application(s)
bull Apache Lucene 36 provides search interface
64
Customizations
bull custom per-data-source JSP web application provides
searchbrowsedownload functionality
bull custom (generic) SPARQL Tag Library provides API for web
apps (available at httpsgithubcomeichmannlod-utilities)
bull custom (generic) Lucene Tag Library provides API for web apps
65
Loading a New Vocabulary
bull download RDF
bull if necessary convert to n-triples (required for GeoNames data for instance)
bull use tdbloader2 to populated triplestore
bull configure Fuseki server(s) with triplestore details
bull create new JSP project in Eclipse
bull write one or more indexer programs that populate Lucene indices and run indexer(s)
bull write searchbrowsedownload application logic using the SPARQL and Lucene tags
bull package project as war
bull deploy to Apache Tomcat server(s)
bull add new service to Apache HTTPD virtual host specification
66
UI Access to Cache Server
httpservicesld4lorgld4l_servicesloc_namejsp
Downloads
68
LoC httpidlocgovdownload (n-triples OR rdf-xml)
OCLC FAST httpwwwoclcorgresearchthemesdata-sciencefastdownloadhtml (n-triples)
GeoNames httpwwwgeonamesorgontologydocumentationhtml (custom format ndash see notes for processing)
AGROVOC httpsaims-faoatlassiannetwikispacesAGVpages2949126Releases (n-triples OR rdf-xml)
NALT httpsagclassnalusdagovdownloadshtml (rdf-xml)
DBpedia httpwikidbpediaorgdownloads-2016-04
Potential Options for Reconciliation
bull VIAF for name reconciliation ndash we are doing some
work with this
bull Wikidata ndash Ive heard that they are working on
Reconciliation issues but havent yet explored in
depth bull Intro Video (3hrs)
bull API Access
bull SPARQL ndash user manual
bull federated queries with other authorities
Doing a google search for linked data reconciliation
returns a large number of articles and presentations
on this concept
Links to Code amp More
bull QA Server - Code for a small app that provides the
Questioning Authority normalization layer
bull Linked Data Authorities - Configurations that can
be used with QA Server
bull LD4L Services - UI access to Cache Server
bull VitroLib - Code for the VitroLib cataloging tool
56
Questions
httptinyurlcomld4l-auth-access
Appendix for Challenges 1-4
Challenge 1 Documentation
58
LoC httpidlocgovtechcenter
C Harlow notes on reconciling LoC - httpsgithubcomcmh2166lc-reconcile
OCLC FAST
httpswwwoclcorgdeveloperdevelopweb-servicesfast-apilinked-dataenhtml
GeoNames
httpwwwgeonamesorgexportgeonames-searchhtml
AGROVOC httpaimsfaoorgvest-registryvocabulariesagrovoc-multilingual-agricultural-thesaurus
swagger config httpsgithubcomNatLibFiSkosmosblobmasterswaggerjson
NALT
httpsagclassnalusdagov
DBpedia httpwikidbpediaorgOnlineAccess1220Public20Faceted20Web20Service20Inter
face
Challenge 2 Linked Data Access API
59
for Search Query for Term Fetch
LoC not supported URI
OCLC FAST httpexperimentalworldcatorgfastsearchq
uery=subauth+all+22query22ampsortK
eys=usageampmaximumRecords=maximumR
ecords
URI
GeoNames httpapigeonamesorgsearchq=queryamp
maxRows=maxRowsampusername=userna
meamptype=rdf
URI
AGROVOC httpartemideartuniroma2it8081agrovocr
estv1searchquery=queryamplang=lang
httpartemideartuniroma2it8081agrovo
crestv1datauri=httpaimsfaoorgaosa
grovocterm_id
NALT httpskosmoslibrarycornelledurestv1nalt
searchquery=queryamplang=lang
httpskosmoslibrarycornelledurestv1na
ltdatauri=term_uri
DBpedia
Challenge 3 Varying Results Formats
60
for Search Query for Term Fetch
LoC not supported rdf-xml
OCLC FAST rdf-xml rdf-xml
GeoNames rdf-xml rdf-xml
AGROVOC json-ld rdf-xml json-ld turtle
NALT json-ld rdf-xml json-ld turtle
DBpedia
Challenge 4 Varying Ontologies
61
Primary Ontology Flat vs Navigation
required
LoC madsrdf
SKOS
navigation required
OCLC FAST schemaorg
SKOS
flat
GeoNames geonames flat
hierarchical
AGROVOC SKOS flat
hierarchical
NALT SKOS flat
hierarchical
DBpedia dbpedia flat
Configurations for Questioning Authority
62
LoC httpsgithubcomld4l-
labslinked_data_authoritiestreemasterqa_locconfigauthoritieslinked_dat
a
OCLC FAST httpsgithubcomld4l-
labslinked_data_authoritiestreemasterqa_oclcfastconfigauthoritieslinked
_data
GeoNames httpsgithubcomld4l-
labslinked_data_authoritiestreemasterqa_geonamesconfigauthoritieslink
ed_data
AGROVOC httpsgithubcomld4l-
labslinked_data_authoritiestreemasterqa_agrovocconfigauthoritieslinked
_data
NALT httpsgithubcomld4l-
labslinked_data_authoritiestreemasterqa_naltconfigauthoritieslinked_dat
a
DBpedia httpsgithubcomld4l-
labslinked_data_authoritiestreemasterqa_dbpediaconfigauthoritieslinked
_data
Appendix for Challenges 5-7
Creating a Cache Server
Hardware
bull 8-core 64gb 3Ghz Mac Pro (late 2013) macOS Sierra
(10126)
bull 32tb Pegasus-2 Thunderbolt RAID configured as RAID-5
Triplestore
bull Apache Jena Fuseki 240 provides SPARQL endpoint
bull Apache Tomcat 90 runs custom web application(s)
bull Apache Lucene 36 provides search interface
64
Customizations
bull custom per-data-source JSP web application provides
searchbrowsedownload functionality
bull custom (generic) SPARQL Tag Library provides API for web
apps (available at httpsgithubcomeichmannlod-utilities)
bull custom (generic) Lucene Tag Library provides API for web apps
65
Loading a New Vocabulary
bull download RDF
bull if necessary convert to n-triples (required for GeoNames data for instance)
bull use tdbloader2 to populated triplestore
bull configure Fuseki server(s) with triplestore details
bull create new JSP project in Eclipse
bull write one or more indexer programs that populate Lucene indices and run indexer(s)
bull write searchbrowsedownload application logic using the SPARQL and Lucene tags
bull package project as war
bull deploy to Apache Tomcat server(s)
bull add new service to Apache HTTPD virtual host specification
66
UI Access to Cache Server
httpservicesld4lorgld4l_servicesloc_namejsp
Downloads
68
LoC httpidlocgovdownload (n-triples OR rdf-xml)
OCLC FAST httpwwwoclcorgresearchthemesdata-sciencefastdownloadhtml (n-triples)
GeoNames httpwwwgeonamesorgontologydocumentationhtml (custom format ndash see notes for processing)
AGROVOC httpsaims-faoatlassiannetwikispacesAGVpages2949126Releases (n-triples OR rdf-xml)
NALT httpsagclassnalusdagovdownloadshtml (rdf-xml)
DBpedia httpwikidbpediaorgdownloads-2016-04
Potential Options for Reconciliation
bull VIAF for name reconciliation ndash we are doing some
work with this
bull Wikidata ndash Ive heard that they are working on
Reconciliation issues but havent yet explored in
depth bull Intro Video (3hrs)
bull API Access
bull SPARQL ndash user manual
bull federated queries with other authorities
Doing a google search for linked data reconciliation
returns a large number of articles and presentations
on this concept
Links to Code amp More
bull QA Server - Code for a small app that provides the
Questioning Authority normalization layer
bull Linked Data Authorities - Configurations that can
be used with QA Server
bull LD4L Services - UI access to Cache Server
bull VitroLib - Code for the VitroLib cataloging tool
Appendix for Challenges 1-4
Challenge 1 Documentation
58
LoC httpidlocgovtechcenter
C Harlow notes on reconciling LoC - httpsgithubcomcmh2166lc-reconcile
OCLC FAST
httpswwwoclcorgdeveloperdevelopweb-servicesfast-apilinked-dataenhtml
GeoNames
httpwwwgeonamesorgexportgeonames-searchhtml
AGROVOC httpaimsfaoorgvest-registryvocabulariesagrovoc-multilingual-agricultural-thesaurus
swagger config httpsgithubcomNatLibFiSkosmosblobmasterswaggerjson
NALT
httpsagclassnalusdagov
DBpedia httpwikidbpediaorgOnlineAccess1220Public20Faceted20Web20Service20Inter
face
Challenge 2 Linked Data Access API
59
for Search Query for Term Fetch
LoC not supported URI
OCLC FAST httpexperimentalworldcatorgfastsearchq
uery=subauth+all+22query22ampsortK
eys=usageampmaximumRecords=maximumR
ecords
URI
GeoNames httpapigeonamesorgsearchq=queryamp
maxRows=maxRowsampusername=userna
meamptype=rdf
URI
AGROVOC httpartemideartuniroma2it8081agrovocr
estv1searchquery=queryamplang=lang
httpartemideartuniroma2it8081agrovo
crestv1datauri=httpaimsfaoorgaosa
grovocterm_id
NALT httpskosmoslibrarycornelledurestv1nalt
searchquery=queryamplang=lang
httpskosmoslibrarycornelledurestv1na
ltdatauri=term_uri
DBpedia
Challenge 3 Varying Results Formats
60
for Search Query for Term Fetch
LoC not supported rdf-xml
OCLC FAST rdf-xml rdf-xml
GeoNames rdf-xml rdf-xml
AGROVOC json-ld rdf-xml json-ld turtle
NALT json-ld rdf-xml json-ld turtle
DBpedia
Challenge 4 Varying Ontologies
61
Primary Ontology Flat vs Navigation
required
LoC madsrdf
SKOS
navigation required
OCLC FAST schemaorg
SKOS
flat
GeoNames geonames flat
hierarchical
AGROVOC SKOS flat
hierarchical
NALT SKOS flat
hierarchical
DBpedia dbpedia flat
Configurations for Questioning Authority
62
LoC httpsgithubcomld4l-
labslinked_data_authoritiestreemasterqa_locconfigauthoritieslinked_dat
a
OCLC FAST httpsgithubcomld4l-
labslinked_data_authoritiestreemasterqa_oclcfastconfigauthoritieslinked
_data
GeoNames httpsgithubcomld4l-
labslinked_data_authoritiestreemasterqa_geonamesconfigauthoritieslink
ed_data
AGROVOC httpsgithubcomld4l-
labslinked_data_authoritiestreemasterqa_agrovocconfigauthoritieslinked
_data
NALT httpsgithubcomld4l-
labslinked_data_authoritiestreemasterqa_naltconfigauthoritieslinked_dat
a
DBpedia httpsgithubcomld4l-
labslinked_data_authoritiestreemasterqa_dbpediaconfigauthoritieslinked
_data
Appendix for Challenges 5-7
Creating a Cache Server
Hardware
bull 8-core 64gb 3Ghz Mac Pro (late 2013) macOS Sierra
(10126)
bull 32tb Pegasus-2 Thunderbolt RAID configured as RAID-5
Triplestore
bull Apache Jena Fuseki 240 provides SPARQL endpoint
bull Apache Tomcat 90 runs custom web application(s)
bull Apache Lucene 36 provides search interface
64
Customizations
bull custom per-data-source JSP web application provides
searchbrowsedownload functionality
bull custom (generic) SPARQL Tag Library provides API for web
apps (available at httpsgithubcomeichmannlod-utilities)
bull custom (generic) Lucene Tag Library provides API for web apps
65
Loading a New Vocabulary
bull download RDF
bull if necessary convert to n-triples (required for GeoNames data for instance)
bull use tdbloader2 to populated triplestore
bull configure Fuseki server(s) with triplestore details
bull create new JSP project in Eclipse
bull write one or more indexer programs that populate Lucene indices and run indexer(s)
bull write searchbrowsedownload application logic using the SPARQL and Lucene tags
bull package project as war
bull deploy to Apache Tomcat server(s)
bull add new service to Apache HTTPD virtual host specification
66
UI Access to Cache Server
httpservicesld4lorgld4l_servicesloc_namejsp
Downloads
68
LoC httpidlocgovdownload (n-triples OR rdf-xml)
OCLC FAST httpwwwoclcorgresearchthemesdata-sciencefastdownloadhtml (n-triples)
GeoNames httpwwwgeonamesorgontologydocumentationhtml (custom format ndash see notes for processing)
AGROVOC httpsaims-faoatlassiannetwikispacesAGVpages2949126Releases (n-triples OR rdf-xml)
NALT httpsagclassnalusdagovdownloadshtml (rdf-xml)
DBpedia httpwikidbpediaorgdownloads-2016-04
Potential Options for Reconciliation
bull VIAF for name reconciliation ndash we are doing some
work with this
bull Wikidata ndash Ive heard that they are working on
Reconciliation issues but havent yet explored in
depth bull Intro Video (3hrs)
bull API Access
bull SPARQL ndash user manual
bull federated queries with other authorities
Doing a google search for linked data reconciliation
returns a large number of articles and presentations
on this concept
Links to Code amp More
bull QA Server - Code for a small app that provides the
Questioning Authority normalization layer
bull Linked Data Authorities - Configurations that can
be used with QA Server
bull LD4L Services - UI access to Cache Server
bull VitroLib - Code for the VitroLib cataloging tool
Challenge 1 Documentation
58
LoC httpidlocgovtechcenter
C Harlow notes on reconciling LoC - httpsgithubcomcmh2166lc-reconcile
OCLC FAST
httpswwwoclcorgdeveloperdevelopweb-servicesfast-apilinked-dataenhtml
GeoNames
httpwwwgeonamesorgexportgeonames-searchhtml
AGROVOC httpaimsfaoorgvest-registryvocabulariesagrovoc-multilingual-agricultural-thesaurus
swagger config httpsgithubcomNatLibFiSkosmosblobmasterswaggerjson
NALT
httpsagclassnalusdagov
DBpedia httpwikidbpediaorgOnlineAccess1220Public20Faceted20Web20Service20Inter
face
Challenge 2 Linked Data Access API
59
for Search Query for Term Fetch
LoC not supported URI
OCLC FAST httpexperimentalworldcatorgfastsearchq
uery=subauth+all+22query22ampsortK
eys=usageampmaximumRecords=maximumR
ecords
URI
GeoNames httpapigeonamesorgsearchq=queryamp
maxRows=maxRowsampusername=userna
meamptype=rdf
URI
AGROVOC httpartemideartuniroma2it8081agrovocr
estv1searchquery=queryamplang=lang
httpartemideartuniroma2it8081agrovo
crestv1datauri=httpaimsfaoorgaosa
grovocterm_id
NALT httpskosmoslibrarycornelledurestv1nalt
searchquery=queryamplang=lang
httpskosmoslibrarycornelledurestv1na
ltdatauri=term_uri
DBpedia
Challenge 3 Varying Results Formats
60
for Search Query for Term Fetch
LoC not supported rdf-xml
OCLC FAST rdf-xml rdf-xml
GeoNames rdf-xml rdf-xml
AGROVOC json-ld rdf-xml json-ld turtle
NALT json-ld rdf-xml json-ld turtle
DBpedia
Challenge 4 Varying Ontologies
61
Primary Ontology Flat vs Navigation
required
LoC madsrdf
SKOS
navigation required
OCLC FAST schemaorg
SKOS
flat
GeoNames geonames flat
hierarchical
AGROVOC SKOS flat
hierarchical
NALT SKOS flat
hierarchical
DBpedia dbpedia flat
Configurations for Questioning Authority
62
LoC httpsgithubcomld4l-
labslinked_data_authoritiestreemasterqa_locconfigauthoritieslinked_dat
a
OCLC FAST httpsgithubcomld4l-
labslinked_data_authoritiestreemasterqa_oclcfastconfigauthoritieslinked
_data
GeoNames httpsgithubcomld4l-
labslinked_data_authoritiestreemasterqa_geonamesconfigauthoritieslink
ed_data
AGROVOC httpsgithubcomld4l-
labslinked_data_authoritiestreemasterqa_agrovocconfigauthoritieslinked
_data
NALT httpsgithubcomld4l-
labslinked_data_authoritiestreemasterqa_naltconfigauthoritieslinked_dat
a
DBpedia httpsgithubcomld4l-
labslinked_data_authoritiestreemasterqa_dbpediaconfigauthoritieslinked
_data
Appendix for Challenges 5-7
Creating a Cache Server
Hardware
bull 8-core 64gb 3Ghz Mac Pro (late 2013) macOS Sierra
(10126)
bull 32tb Pegasus-2 Thunderbolt RAID configured as RAID-5
Triplestore
bull Apache Jena Fuseki 240 provides SPARQL endpoint
bull Apache Tomcat 90 runs custom web application(s)
bull Apache Lucene 36 provides search interface
64
Customizations
bull custom per-data-source JSP web application provides
searchbrowsedownload functionality
bull custom (generic) SPARQL Tag Library provides API for web
apps (available at httpsgithubcomeichmannlod-utilities)
bull custom (generic) Lucene Tag Library provides API for web apps
65
Loading a New Vocabulary
bull download RDF
bull if necessary convert to n-triples (required for GeoNames data for instance)
bull use tdbloader2 to populated triplestore
bull configure Fuseki server(s) with triplestore details
bull create new JSP project in Eclipse
bull write one or more indexer programs that populate Lucene indices and run indexer(s)
bull write searchbrowsedownload application logic using the SPARQL and Lucene tags
bull package project as war
bull deploy to Apache Tomcat server(s)
bull add new service to Apache HTTPD virtual host specification
66
UI Access to Cache Server
httpservicesld4lorgld4l_servicesloc_namejsp
Downloads
68
LoC httpidlocgovdownload (n-triples OR rdf-xml)
OCLC FAST httpwwwoclcorgresearchthemesdata-sciencefastdownloadhtml (n-triples)
GeoNames httpwwwgeonamesorgontologydocumentationhtml (custom format ndash see notes for processing)
AGROVOC httpsaims-faoatlassiannetwikispacesAGVpages2949126Releases (n-triples OR rdf-xml)
NALT httpsagclassnalusdagovdownloadshtml (rdf-xml)
DBpedia httpwikidbpediaorgdownloads-2016-04
Potential Options for Reconciliation
bull VIAF for name reconciliation ndash we are doing some
work with this
bull Wikidata ndash Ive heard that they are working on
Reconciliation issues but havent yet explored in
depth bull Intro Video (3hrs)
bull API Access
bull SPARQL ndash user manual
bull federated queries with other authorities
Doing a google search for linked data reconciliation
returns a large number of articles and presentations
on this concept
Links to Code amp More
bull QA Server - Code for a small app that provides the
Questioning Authority normalization layer
bull Linked Data Authorities - Configurations that can
be used with QA Server
bull LD4L Services - UI access to Cache Server
bull VitroLib - Code for the VitroLib cataloging tool
Challenge 2 Linked Data Access API
59
for Search Query for Term Fetch
LoC not supported URI
OCLC FAST httpexperimentalworldcatorgfastsearchq
uery=subauth+all+22query22ampsortK
eys=usageampmaximumRecords=maximumR
ecords
URI
GeoNames httpapigeonamesorgsearchq=queryamp
maxRows=maxRowsampusername=userna
meamptype=rdf
URI
AGROVOC httpartemideartuniroma2it8081agrovocr
estv1searchquery=queryamplang=lang
httpartemideartuniroma2it8081agrovo
crestv1datauri=httpaimsfaoorgaosa
grovocterm_id
NALT httpskosmoslibrarycornelledurestv1nalt
searchquery=queryamplang=lang
httpskosmoslibrarycornelledurestv1na
ltdatauri=term_uri
DBpedia
Challenge 3 Varying Results Formats
60
for Search Query for Term Fetch
LoC not supported rdf-xml
OCLC FAST rdf-xml rdf-xml
GeoNames rdf-xml rdf-xml
AGROVOC json-ld rdf-xml json-ld turtle
NALT json-ld rdf-xml json-ld turtle
DBpedia
Challenge 4 Varying Ontologies
61
Primary Ontology Flat vs Navigation
required
LoC madsrdf
SKOS
navigation required
OCLC FAST schemaorg
SKOS
flat
GeoNames geonames flat
hierarchical
AGROVOC SKOS flat
hierarchical
NALT SKOS flat
hierarchical
DBpedia dbpedia flat
Configurations for Questioning Authority
62
LoC httpsgithubcomld4l-
labslinked_data_authoritiestreemasterqa_locconfigauthoritieslinked_dat
a
OCLC FAST httpsgithubcomld4l-
labslinked_data_authoritiestreemasterqa_oclcfastconfigauthoritieslinked
_data
GeoNames httpsgithubcomld4l-
labslinked_data_authoritiestreemasterqa_geonamesconfigauthoritieslink
ed_data
AGROVOC httpsgithubcomld4l-
labslinked_data_authoritiestreemasterqa_agrovocconfigauthoritieslinked
_data
NALT httpsgithubcomld4l-
labslinked_data_authoritiestreemasterqa_naltconfigauthoritieslinked_dat
a
DBpedia httpsgithubcomld4l-
labslinked_data_authoritiestreemasterqa_dbpediaconfigauthoritieslinked
_data
Appendix for Challenges 5-7
Creating a Cache Server
Hardware
bull 8-core 64gb 3Ghz Mac Pro (late 2013) macOS Sierra
(10126)
bull 32tb Pegasus-2 Thunderbolt RAID configured as RAID-5
Triplestore
bull Apache Jena Fuseki 240 provides SPARQL endpoint
bull Apache Tomcat 90 runs custom web application(s)
bull Apache Lucene 36 provides search interface
64
Customizations
bull custom per-data-source JSP web application provides
searchbrowsedownload functionality
bull custom (generic) SPARQL Tag Library provides API for web
apps (available at httpsgithubcomeichmannlod-utilities)
bull custom (generic) Lucene Tag Library provides API for web apps
65
Loading a New Vocabulary
bull download RDF
bull if necessary convert to n-triples (required for GeoNames data for instance)
bull use tdbloader2 to populated triplestore
bull configure Fuseki server(s) with triplestore details
bull create new JSP project in Eclipse
bull write one or more indexer programs that populate Lucene indices and run indexer(s)
bull write searchbrowsedownload application logic using the SPARQL and Lucene tags
bull package project as war
bull deploy to Apache Tomcat server(s)
bull add new service to Apache HTTPD virtual host specification
66
UI Access to Cache Server
httpservicesld4lorgld4l_servicesloc_namejsp
Downloads
68
LoC httpidlocgovdownload (n-triples OR rdf-xml)
OCLC FAST httpwwwoclcorgresearchthemesdata-sciencefastdownloadhtml (n-triples)
GeoNames httpwwwgeonamesorgontologydocumentationhtml (custom format ndash see notes for processing)
AGROVOC httpsaims-faoatlassiannetwikispacesAGVpages2949126Releases (n-triples OR rdf-xml)
NALT httpsagclassnalusdagovdownloadshtml (rdf-xml)
DBpedia httpwikidbpediaorgdownloads-2016-04
Potential Options for Reconciliation
bull VIAF for name reconciliation ndash we are doing some
work with this
bull Wikidata ndash Ive heard that they are working on
Reconciliation issues but havent yet explored in
depth bull Intro Video (3hrs)
bull API Access
bull SPARQL ndash user manual
bull federated queries with other authorities
Doing a google search for linked data reconciliation
returns a large number of articles and presentations
on this concept
Links to Code amp More
bull QA Server - Code for a small app that provides the
Questioning Authority normalization layer
bull Linked Data Authorities - Configurations that can
be used with QA Server
bull LD4L Services - UI access to Cache Server
bull VitroLib - Code for the VitroLib cataloging tool
Challenge 3 Varying Results Formats
60
for Search Query for Term Fetch
LoC not supported rdf-xml
OCLC FAST rdf-xml rdf-xml
GeoNames rdf-xml rdf-xml
AGROVOC json-ld rdf-xml json-ld turtle
NALT json-ld rdf-xml json-ld turtle
DBpedia
Challenge 4 Varying Ontologies
61
Primary Ontology Flat vs Navigation
required
LoC madsrdf
SKOS
navigation required
OCLC FAST schemaorg
SKOS
flat
GeoNames geonames flat
hierarchical
AGROVOC SKOS flat
hierarchical
NALT SKOS flat
hierarchical
DBpedia dbpedia flat
Configurations for Questioning Authority
62
LoC httpsgithubcomld4l-
labslinked_data_authoritiestreemasterqa_locconfigauthoritieslinked_dat
a
OCLC FAST httpsgithubcomld4l-
labslinked_data_authoritiestreemasterqa_oclcfastconfigauthoritieslinked
_data
GeoNames httpsgithubcomld4l-
labslinked_data_authoritiestreemasterqa_geonamesconfigauthoritieslink
ed_data
AGROVOC httpsgithubcomld4l-
labslinked_data_authoritiestreemasterqa_agrovocconfigauthoritieslinked
_data
NALT httpsgithubcomld4l-
labslinked_data_authoritiestreemasterqa_naltconfigauthoritieslinked_dat
a
DBpedia httpsgithubcomld4l-
labslinked_data_authoritiestreemasterqa_dbpediaconfigauthoritieslinked
_data
Appendix for Challenges 5-7
Creating a Cache Server
Hardware
bull 8-core 64gb 3Ghz Mac Pro (late 2013) macOS Sierra
(10126)
bull 32tb Pegasus-2 Thunderbolt RAID configured as RAID-5
Triplestore
bull Apache Jena Fuseki 240 provides SPARQL endpoint
bull Apache Tomcat 90 runs custom web application(s)
bull Apache Lucene 36 provides search interface
64
Customizations
bull custom per-data-source JSP web application provides
searchbrowsedownload functionality
bull custom (generic) SPARQL Tag Library provides API for web
apps (available at httpsgithubcomeichmannlod-utilities)
bull custom (generic) Lucene Tag Library provides API for web apps
65
Loading a New Vocabulary
bull download RDF
bull if necessary convert to n-triples (required for GeoNames data for instance)
bull use tdbloader2 to populated triplestore
bull configure Fuseki server(s) with triplestore details
bull create new JSP project in Eclipse
bull write one or more indexer programs that populate Lucene indices and run indexer(s)
bull write searchbrowsedownload application logic using the SPARQL and Lucene tags
bull package project as war
bull deploy to Apache Tomcat server(s)
bull add new service to Apache HTTPD virtual host specification
66
UI Access to Cache Server
httpservicesld4lorgld4l_servicesloc_namejsp
Downloads
68
LoC httpidlocgovdownload (n-triples OR rdf-xml)
OCLC FAST httpwwwoclcorgresearchthemesdata-sciencefastdownloadhtml (n-triples)
GeoNames httpwwwgeonamesorgontologydocumentationhtml (custom format ndash see notes for processing)
AGROVOC httpsaims-faoatlassiannetwikispacesAGVpages2949126Releases (n-triples OR rdf-xml)
NALT httpsagclassnalusdagovdownloadshtml (rdf-xml)
DBpedia httpwikidbpediaorgdownloads-2016-04
Potential Options for Reconciliation
bull VIAF for name reconciliation ndash we are doing some
work with this
bull Wikidata ndash Ive heard that they are working on
Reconciliation issues but havent yet explored in
depth bull Intro Video (3hrs)
bull API Access
bull SPARQL ndash user manual
bull federated queries with other authorities
Doing a google search for linked data reconciliation
returns a large number of articles and presentations
on this concept
Links to Code amp More
bull QA Server - Code for a small app that provides the
Questioning Authority normalization layer
bull Linked Data Authorities - Configurations that can
be used with QA Server
bull LD4L Services - UI access to Cache Server
bull VitroLib - Code for the VitroLib cataloging tool
Challenge 4 Varying Ontologies
61
Primary Ontology Flat vs Navigation
required
LoC madsrdf
SKOS
navigation required
OCLC FAST schemaorg
SKOS
flat
GeoNames geonames flat
hierarchical
AGROVOC SKOS flat
hierarchical
NALT SKOS flat
hierarchical
DBpedia dbpedia flat
Configurations for Questioning Authority
62
LoC httpsgithubcomld4l-
labslinked_data_authoritiestreemasterqa_locconfigauthoritieslinked_dat
a
OCLC FAST httpsgithubcomld4l-
labslinked_data_authoritiestreemasterqa_oclcfastconfigauthoritieslinked
_data
GeoNames httpsgithubcomld4l-
labslinked_data_authoritiestreemasterqa_geonamesconfigauthoritieslink
ed_data
AGROVOC httpsgithubcomld4l-
labslinked_data_authoritiestreemasterqa_agrovocconfigauthoritieslinked
_data
NALT httpsgithubcomld4l-
labslinked_data_authoritiestreemasterqa_naltconfigauthoritieslinked_dat
a
DBpedia httpsgithubcomld4l-
labslinked_data_authoritiestreemasterqa_dbpediaconfigauthoritieslinked
_data
Appendix for Challenges 5-7
Creating a Cache Server
Hardware
bull 8-core 64gb 3Ghz Mac Pro (late 2013) macOS Sierra
(10126)
bull 32tb Pegasus-2 Thunderbolt RAID configured as RAID-5
Triplestore
bull Apache Jena Fuseki 240 provides SPARQL endpoint
bull Apache Tomcat 90 runs custom web application(s)
bull Apache Lucene 36 provides search interface
64
Customizations
bull custom per-data-source JSP web application provides
searchbrowsedownload functionality
bull custom (generic) SPARQL Tag Library provides API for web
apps (available at httpsgithubcomeichmannlod-utilities)
bull custom (generic) Lucene Tag Library provides API for web apps
65
Loading a New Vocabulary
bull download RDF
bull if necessary convert to n-triples (required for GeoNames data for instance)
bull use tdbloader2 to populated triplestore
bull configure Fuseki server(s) with triplestore details
bull create new JSP project in Eclipse
bull write one or more indexer programs that populate Lucene indices and run indexer(s)
bull write searchbrowsedownload application logic using the SPARQL and Lucene tags
bull package project as war
bull deploy to Apache Tomcat server(s)
bull add new service to Apache HTTPD virtual host specification
66
UI Access to Cache Server
httpservicesld4lorgld4l_servicesloc_namejsp
Downloads
68
LoC httpidlocgovdownload (n-triples OR rdf-xml)
OCLC FAST httpwwwoclcorgresearchthemesdata-sciencefastdownloadhtml (n-triples)
GeoNames httpwwwgeonamesorgontologydocumentationhtml (custom format ndash see notes for processing)
AGROVOC httpsaims-faoatlassiannetwikispacesAGVpages2949126Releases (n-triples OR rdf-xml)
NALT httpsagclassnalusdagovdownloadshtml (rdf-xml)
DBpedia httpwikidbpediaorgdownloads-2016-04
Potential Options for Reconciliation
bull VIAF for name reconciliation ndash we are doing some
work with this
bull Wikidata ndash Ive heard that they are working on
Reconciliation issues but havent yet explored in
depth bull Intro Video (3hrs)
bull API Access
bull SPARQL ndash user manual
bull federated queries with other authorities
Doing a google search for linked data reconciliation
returns a large number of articles and presentations
on this concept
Links to Code amp More
bull QA Server - Code for a small app that provides the
Questioning Authority normalization layer
bull Linked Data Authorities - Configurations that can
be used with QA Server
bull LD4L Services - UI access to Cache Server
bull VitroLib - Code for the VitroLib cataloging tool
Configurations for Questioning Authority
62
LoC httpsgithubcomld4l-
labslinked_data_authoritiestreemasterqa_locconfigauthoritieslinked_dat
a
OCLC FAST httpsgithubcomld4l-
labslinked_data_authoritiestreemasterqa_oclcfastconfigauthoritieslinked
_data
GeoNames httpsgithubcomld4l-
labslinked_data_authoritiestreemasterqa_geonamesconfigauthoritieslink
ed_data
AGROVOC httpsgithubcomld4l-
labslinked_data_authoritiestreemasterqa_agrovocconfigauthoritieslinked
_data
NALT httpsgithubcomld4l-
labslinked_data_authoritiestreemasterqa_naltconfigauthoritieslinked_dat
a
DBpedia httpsgithubcomld4l-
labslinked_data_authoritiestreemasterqa_dbpediaconfigauthoritieslinked
_data
Appendix for Challenges 5-7
Creating a Cache Server
Hardware
bull 8-core 64gb 3Ghz Mac Pro (late 2013) macOS Sierra
(10126)
bull 32tb Pegasus-2 Thunderbolt RAID configured as RAID-5
Triplestore
bull Apache Jena Fuseki 240 provides SPARQL endpoint
bull Apache Tomcat 90 runs custom web application(s)
bull Apache Lucene 36 provides search interface
64
Customizations
bull custom per-data-source JSP web application provides
searchbrowsedownload functionality
bull custom (generic) SPARQL Tag Library provides API for web
apps (available at httpsgithubcomeichmannlod-utilities)
bull custom (generic) Lucene Tag Library provides API for web apps
65
Loading a New Vocabulary
bull download RDF
bull if necessary convert to n-triples (required for GeoNames data for instance)
bull use tdbloader2 to populated triplestore
bull configure Fuseki server(s) with triplestore details
bull create new JSP project in Eclipse
bull write one or more indexer programs that populate Lucene indices and run indexer(s)
bull write searchbrowsedownload application logic using the SPARQL and Lucene tags
bull package project as war
bull deploy to Apache Tomcat server(s)
bull add new service to Apache HTTPD virtual host specification
66
UI Access to Cache Server
httpservicesld4lorgld4l_servicesloc_namejsp
Downloads
68
LoC httpidlocgovdownload (n-triples OR rdf-xml)
OCLC FAST httpwwwoclcorgresearchthemesdata-sciencefastdownloadhtml (n-triples)
GeoNames httpwwwgeonamesorgontologydocumentationhtml (custom format ndash see notes for processing)
AGROVOC httpsaims-faoatlassiannetwikispacesAGVpages2949126Releases (n-triples OR rdf-xml)
NALT httpsagclassnalusdagovdownloadshtml (rdf-xml)
DBpedia httpwikidbpediaorgdownloads-2016-04
Potential Options for Reconciliation
bull VIAF for name reconciliation ndash we are doing some
work with this
bull Wikidata ndash Ive heard that they are working on
Reconciliation issues but havent yet explored in
depth bull Intro Video (3hrs)
bull API Access
bull SPARQL ndash user manual
bull federated queries with other authorities
Doing a google search for linked data reconciliation
returns a large number of articles and presentations
on this concept
Links to Code amp More
bull QA Server - Code for a small app that provides the
Questioning Authority normalization layer
bull Linked Data Authorities - Configurations that can
be used with QA Server
bull LD4L Services - UI access to Cache Server
bull VitroLib - Code for the VitroLib cataloging tool
Appendix for Challenges 5-7
Creating a Cache Server
Hardware
bull 8-core 64gb 3Ghz Mac Pro (late 2013) macOS Sierra
(10126)
bull 32tb Pegasus-2 Thunderbolt RAID configured as RAID-5
Triplestore
bull Apache Jena Fuseki 240 provides SPARQL endpoint
bull Apache Tomcat 90 runs custom web application(s)
bull Apache Lucene 36 provides search interface
64
Customizations
bull custom per-data-source JSP web application provides
searchbrowsedownload functionality
bull custom (generic) SPARQL Tag Library provides API for web
apps (available at httpsgithubcomeichmannlod-utilities)
bull custom (generic) Lucene Tag Library provides API for web apps
65
Loading a New Vocabulary
bull download RDF
bull if necessary convert to n-triples (required for GeoNames data for instance)
bull use tdbloader2 to populated triplestore
bull configure Fuseki server(s) with triplestore details
bull create new JSP project in Eclipse
bull write one or more indexer programs that populate Lucene indices and run indexer(s)
bull write searchbrowsedownload application logic using the SPARQL and Lucene tags
bull package project as war
bull deploy to Apache Tomcat server(s)
bull add new service to Apache HTTPD virtual host specification
66
UI Access to Cache Server
httpservicesld4lorgld4l_servicesloc_namejsp
Downloads
68
LoC httpidlocgovdownload (n-triples OR rdf-xml)
OCLC FAST httpwwwoclcorgresearchthemesdata-sciencefastdownloadhtml (n-triples)
GeoNames httpwwwgeonamesorgontologydocumentationhtml (custom format ndash see notes for processing)
AGROVOC httpsaims-faoatlassiannetwikispacesAGVpages2949126Releases (n-triples OR rdf-xml)
NALT httpsagclassnalusdagovdownloadshtml (rdf-xml)
DBpedia httpwikidbpediaorgdownloads-2016-04
Potential Options for Reconciliation
bull VIAF for name reconciliation ndash we are doing some
work with this
bull Wikidata ndash Ive heard that they are working on
Reconciliation issues but havent yet explored in
depth bull Intro Video (3hrs)
bull API Access
bull SPARQL ndash user manual
bull federated queries with other authorities
Doing a google search for linked data reconciliation
returns a large number of articles and presentations
on this concept
Links to Code amp More
bull QA Server - Code for a small app that provides the
Questioning Authority normalization layer
bull Linked Data Authorities - Configurations that can
be used with QA Server
bull LD4L Services - UI access to Cache Server
bull VitroLib - Code for the VitroLib cataloging tool
Creating a Cache Server
Hardware
bull 8-core 64gb 3Ghz Mac Pro (late 2013) macOS Sierra
(10126)
bull 32tb Pegasus-2 Thunderbolt RAID configured as RAID-5
Triplestore
bull Apache Jena Fuseki 240 provides SPARQL endpoint
bull Apache Tomcat 90 runs custom web application(s)
bull Apache Lucene 36 provides search interface
64
Customizations
bull custom per-data-source JSP web application provides
searchbrowsedownload functionality
bull custom (generic) SPARQL Tag Library provides API for web
apps (available at httpsgithubcomeichmannlod-utilities)
bull custom (generic) Lucene Tag Library provides API for web apps
65
Loading a New Vocabulary
bull download RDF
bull if necessary convert to n-triples (required for GeoNames data for instance)
bull use tdbloader2 to populated triplestore
bull configure Fuseki server(s) with triplestore details
bull create new JSP project in Eclipse
bull write one or more indexer programs that populate Lucene indices and run indexer(s)
bull write searchbrowsedownload application logic using the SPARQL and Lucene tags
bull package project as war
bull deploy to Apache Tomcat server(s)
bull add new service to Apache HTTPD virtual host specification
66
UI Access to Cache Server
httpservicesld4lorgld4l_servicesloc_namejsp
Downloads
68
LoC httpidlocgovdownload (n-triples OR rdf-xml)
OCLC FAST httpwwwoclcorgresearchthemesdata-sciencefastdownloadhtml (n-triples)
GeoNames httpwwwgeonamesorgontologydocumentationhtml (custom format ndash see notes for processing)
AGROVOC httpsaims-faoatlassiannetwikispacesAGVpages2949126Releases (n-triples OR rdf-xml)
NALT httpsagclassnalusdagovdownloadshtml (rdf-xml)
DBpedia httpwikidbpediaorgdownloads-2016-04
Potential Options for Reconciliation
bull VIAF for name reconciliation ndash we are doing some
work with this
bull Wikidata ndash Ive heard that they are working on
Reconciliation issues but havent yet explored in
depth bull Intro Video (3hrs)
bull API Access
bull SPARQL ndash user manual
bull federated queries with other authorities
Doing a google search for linked data reconciliation
returns a large number of articles and presentations
on this concept
Links to Code amp More
bull QA Server - Code for a small app that provides the
Questioning Authority normalization layer
bull Linked Data Authorities - Configurations that can
be used with QA Server
bull LD4L Services - UI access to Cache Server
bull VitroLib - Code for the VitroLib cataloging tool
Customizations
bull custom per-data-source JSP web application provides
searchbrowsedownload functionality
bull custom (generic) SPARQL Tag Library provides API for web
apps (available at httpsgithubcomeichmannlod-utilities)
bull custom (generic) Lucene Tag Library provides API for web apps
65
Loading a New Vocabulary
bull download RDF
bull if necessary convert to n-triples (required for GeoNames data for instance)
bull use tdbloader2 to populated triplestore
bull configure Fuseki server(s) with triplestore details
bull create new JSP project in Eclipse
bull write one or more indexer programs that populate Lucene indices and run indexer(s)
bull write searchbrowsedownload application logic using the SPARQL and Lucene tags
bull package project as war
bull deploy to Apache Tomcat server(s)
bull add new service to Apache HTTPD virtual host specification
66
UI Access to Cache Server
httpservicesld4lorgld4l_servicesloc_namejsp
Downloads
68
LoC httpidlocgovdownload (n-triples OR rdf-xml)
OCLC FAST httpwwwoclcorgresearchthemesdata-sciencefastdownloadhtml (n-triples)
GeoNames httpwwwgeonamesorgontologydocumentationhtml (custom format ndash see notes for processing)
AGROVOC httpsaims-faoatlassiannetwikispacesAGVpages2949126Releases (n-triples OR rdf-xml)
NALT httpsagclassnalusdagovdownloadshtml (rdf-xml)
DBpedia httpwikidbpediaorgdownloads-2016-04
Potential Options for Reconciliation
bull VIAF for name reconciliation ndash we are doing some
work with this
bull Wikidata ndash Ive heard that they are working on
Reconciliation issues but havent yet explored in
depth bull Intro Video (3hrs)
bull API Access
bull SPARQL ndash user manual
bull federated queries with other authorities
Doing a google search for linked data reconciliation
returns a large number of articles and presentations
on this concept
Links to Code amp More
bull QA Server - Code for a small app that provides the
Questioning Authority normalization layer
bull Linked Data Authorities - Configurations that can
be used with QA Server
bull LD4L Services - UI access to Cache Server
bull VitroLib - Code for the VitroLib cataloging tool
Loading a New Vocabulary
bull download RDF
bull if necessary convert to n-triples (required for GeoNames data for instance)
bull use tdbloader2 to populated triplestore
bull configure Fuseki server(s) with triplestore details
bull create new JSP project in Eclipse
bull write one or more indexer programs that populate Lucene indices and run indexer(s)
bull write searchbrowsedownload application logic using the SPARQL and Lucene tags
bull package project as war
bull deploy to Apache Tomcat server(s)
bull add new service to Apache HTTPD virtual host specification
66
UI Access to Cache Server
httpservicesld4lorgld4l_servicesloc_namejsp
Downloads
68
LoC httpidlocgovdownload (n-triples OR rdf-xml)
OCLC FAST httpwwwoclcorgresearchthemesdata-sciencefastdownloadhtml (n-triples)
GeoNames httpwwwgeonamesorgontologydocumentationhtml (custom format ndash see notes for processing)
AGROVOC httpsaims-faoatlassiannetwikispacesAGVpages2949126Releases (n-triples OR rdf-xml)
NALT httpsagclassnalusdagovdownloadshtml (rdf-xml)
DBpedia httpwikidbpediaorgdownloads-2016-04
Potential Options for Reconciliation
bull VIAF for name reconciliation ndash we are doing some
work with this
bull Wikidata ndash Ive heard that they are working on
Reconciliation issues but havent yet explored in
depth bull Intro Video (3hrs)
bull API Access
bull SPARQL ndash user manual
bull federated queries with other authorities
Doing a google search for linked data reconciliation
returns a large number of articles and presentations
on this concept
Links to Code amp More
bull QA Server - Code for a small app that provides the
Questioning Authority normalization layer
bull Linked Data Authorities - Configurations that can
be used with QA Server
bull LD4L Services - UI access to Cache Server
bull VitroLib - Code for the VitroLib cataloging tool
UI Access to Cache Server
httpservicesld4lorgld4l_servicesloc_namejsp
Downloads
68
LoC httpidlocgovdownload (n-triples OR rdf-xml)
OCLC FAST httpwwwoclcorgresearchthemesdata-sciencefastdownloadhtml (n-triples)
GeoNames httpwwwgeonamesorgontologydocumentationhtml (custom format ndash see notes for processing)
AGROVOC httpsaims-faoatlassiannetwikispacesAGVpages2949126Releases (n-triples OR rdf-xml)
NALT httpsagclassnalusdagovdownloadshtml (rdf-xml)
DBpedia httpwikidbpediaorgdownloads-2016-04
Potential Options for Reconciliation
bull VIAF for name reconciliation ndash we are doing some
work with this
bull Wikidata ndash Ive heard that they are working on
Reconciliation issues but havent yet explored in
depth bull Intro Video (3hrs)
bull API Access
bull SPARQL ndash user manual
bull federated queries with other authorities
Doing a google search for linked data reconciliation
returns a large number of articles and presentations
on this concept
Links to Code amp More
bull QA Server - Code for a small app that provides the
Questioning Authority normalization layer
bull Linked Data Authorities - Configurations that can
be used with QA Server
bull LD4L Services - UI access to Cache Server
bull VitroLib - Code for the VitroLib cataloging tool
Downloads
68
LoC httpidlocgovdownload (n-triples OR rdf-xml)
OCLC FAST httpwwwoclcorgresearchthemesdata-sciencefastdownloadhtml (n-triples)
GeoNames httpwwwgeonamesorgontologydocumentationhtml (custom format ndash see notes for processing)
AGROVOC httpsaims-faoatlassiannetwikispacesAGVpages2949126Releases (n-triples OR rdf-xml)
NALT httpsagclassnalusdagovdownloadshtml (rdf-xml)
DBpedia httpwikidbpediaorgdownloads-2016-04
Potential Options for Reconciliation
bull VIAF for name reconciliation ndash we are doing some
work with this
bull Wikidata ndash Ive heard that they are working on
Reconciliation issues but havent yet explored in
depth bull Intro Video (3hrs)
bull API Access
bull SPARQL ndash user manual
bull federated queries with other authorities
Doing a google search for linked data reconciliation
returns a large number of articles and presentations
on this concept
Links to Code amp More
bull QA Server - Code for a small app that provides the
Questioning Authority normalization layer
bull Linked Data Authorities - Configurations that can
be used with QA Server
bull LD4L Services - UI access to Cache Server
bull VitroLib - Code for the VitroLib cataloging tool
Potential Options for Reconciliation
bull VIAF for name reconciliation ndash we are doing some
work with this
bull Wikidata ndash Ive heard that they are working on
Reconciliation issues but havent yet explored in
depth bull Intro Video (3hrs)
bull API Access
bull SPARQL ndash user manual
bull federated queries with other authorities
Doing a google search for linked data reconciliation
returns a large number of articles and presentations
on this concept
Links to Code amp More
bull QA Server - Code for a small app that provides the
Questioning Authority normalization layer
bull Linked Data Authorities - Configurations that can
be used with QA Server
bull LD4L Services - UI access to Cache Server
bull VitroLib - Code for the VitroLib cataloging tool
Links to Code amp More
bull QA Server - Code for a small app that provides the
Questioning Authority normalization layer
bull Linked Data Authorities - Configurations that can
be used with QA Server
bull LD4L Services - UI access to Cache Server
bull VitroLib - Code for the VitroLib cataloging tool