Post on 26-Jun-2020
The Linked Data Snowball andWhy We Need Reconciliation
December 1st, 2014
T H E A N D R E W W. M E L L O N F O U N D AT I O N
W O R K S H O P O N R E C O N C I L I AT I O N O F L I N K E D O P E N D ATA
Rob Sanderson / azaroth@stanford.edu / @azaroth42
The Linked Data Snowball andWhy We Need Reconciliation
December 1st, 2014
T H E A N D R E W W. M E L L O N F O U N D AT I O N
W O R K S H O P O N R E C O N C I L I AT I O N O F L I N K E D O P E N D ATA
Rob Sanderson / azaroth@stanford.edu / @azaroth42
web.stanford.edu/~azaroth/#me
azaroth42@gmail.com / +azaroth42
orcid: 0000-0003-4441-6852
The Linked Data Snowball andWhy We Need Reconciliation
December 1st, 2014
T H E A N D R E W W. M E L L O N F O U N D AT I O NW O R K S H O P O N R E C O N C I L I AT I O N O F L I N K E D O P E N D ATA
Rob Sanderson / azaroth@stanford.edu / @azaroth42
web.stanford.edu/~azaroth/#me
azaroth42@gmail.com / +azaroth42
orcid: 0000-0003-4441-6852
http://www.informatik.uni-trier.de/~ley/pers/hd/s/Sanderson:Robert
http://academic.research.microsoft.com/Author/2765999
http://www.scopus.com/authid/detail.url?authorId=8988953600
www.researchgate.net/profile/Rob_Sanderson
facebook.com/rob.sanderson / linkedin.com/pub/robert-sanderson/1/172/5a6/
rsanderson@lanl.gov / azaroth@liv.ac.uk
public.lanl.gov/rsanderson / gondolin.hist.liv.ac.uk/~azaroth
rds23@student.canterbury.ac.nz / azaroth@es-net.co.nz
A Brief Survey of Linked Open Data
http://lod-cloud.net/ as of Aug 2014
Some Highlights
Libraries:
BNF, DNB, BL, LoC, KB, ...
Archives:
SNAC, LOCAH, Medici Archives, ...
Museums:
BM, YCBA, vu.nl, Smithsonian, Getty, AAC, ...
Consortia:
Europeana (+), TEL, RLUK, DPLA, ...
Government:
data.gov, data.gov.uk, legislation.gov.uk, ...
Companies:
OCLC, Google, IBM, New York Times, ...
Lots of Adoption = Lots of URIs
Lots of Adoption = Lots of URIs
For the Same Thing :(
Why So Many?
Do I know the URI, or can I find it?
URI
No
Why So Many?
Do I know the URI, or can I find it? No
Understand and agree with the model used?No
URI
Why So Many?
Do I know the URI, or can I find it? No
Understand and agree with the model used?No
Understand and agree with the description?No
URI
Why So Many?
Do I know the URI, or can I find it? No
Understand and agree with the model used?No
Understand and agree with the description?No
Agree the URI identifies the same entity?No
URI
Why So Many?
Do I know the URI, or can I find it? No
Understand and agree with the model used?No
Understand and agree with the description?No
Agree the URI identifies the same entity?No
Agree description is complete?No
URI
Why So Many?
Do I know the URI, or can I find it? No
Understand and agree with the model used?No
Understand and agree with the description?No
Agree the URI identifies the same entity?No
Agree description is complete?No
Hooray, you reused a URI! URIYes
Why So Many?
Do I know the URI, or can I find it? No
Understand and agree with the model used?No
Understand and agree with the description?No
Agree the URI identifies the same entity?No
Agree description is complete?No
Hooray, you reused a URI!Now start again with the next one :(
URIYes
Many Special and Unique Snowflakes
Become a Huge Technical Debt Snowball
Option 1: Balance the Equation
Cost(Create URI)
+
Cost(Maintain URI)
Cost(Find Good URI) +
Cost(Understand Model) +
Cost(Understand Content)
+
Cost(Network Latency)
+
min( Risk(Reliability),
Cost(Cache Content) )
-
Value(Linking Graph)
<=
Option 1 Likelihood
Option 2: Reconciliation of URIs
Stanford's URIs British Library's URIs
Option 2: Reconciliation of URIs
Stanford's
Entities
British
Library's
Entities
Shared Entities without Shared URIs
Option 2: Reconciliation Process
Discover this intersection given the descriptions of the entities
Option 2: Reconciliation Process
Best sort of engineering problem:
• Easy to explain
• Helps many organizations at once
• Provides significant value and utility
Option 2: Reconciliation Process
Best sort of engineering problem:
• Easy to explain
• Helps many organizations at once
• Provides significant value and utility
• Difficult to solve
Option 2: Reconciliation Process
Best sort of engineering problem:
• Easy to explain
• Helps many organizations at once
• Provides significant value and utility
• Difficult to solve
But:
• Requires community adoption of the results
Current Community
Expectation Management is Important
Or at best:
Top Three Questions to Answer(according to Rob)
Which sorts of entities should this community reconcile?
How can we share the engineering internationally?
How do we ensure future usage of the reconciled entities?
Thoughts: Entities to Reconcile
Start with least controversial and most unique
• Unique physical objects
• People
• Places
Must generate consensus around identity within the LAM
community.
Must focus on unique selling points – how can we be more
useful than DBPedia for our own entities?
Thoughts: Shared Engineering
Let a thousand snowflakes fall ...
... then build the best snowball possible.
• Solve small, manageable problems well
• Interoperability between platforms: plug and play
• Communicate continuously
Focused projects that fit into a whole, leveraging the experts in
the appropriate domain.
Requires some degree of community structure and
management to ensure we're building off each other.
Thoughts: Ensure Usage
Build consensus early and often
• Between institutions
• Within the LAM community
• Outside the LAM community
Who should be here that isn't? Lots of libraries, also need
input from Museums and Archives as they have more unique
entities.
We need to ensure that LAM use the reconciled entities,
which involves starting to balance the cost equation.
Thank You!
December 1st, 2014
Rob Sanderson / azaroth@stanford.edu / @azaroth42
web.stanford.edu/~azaroth/#me
azaroth42@gmail.com / +azaroth42
orcid: 0000-0003-4441-6852
http://www.informatik.uni-trier.de/~ley/pers/hd/s/Sanderson:Robert
http://academic.research.microsoft.com/Author/2765999
http://www.scopus.com/authid/detail.url?authorId=8988953600
www.researchgate.net/profile/Rob_Sanderson
facebook.com/rob.sanderson / linkedin.com/pub/robert-sanderson/1/172/5a6/
rsanderson@lanl.gov / azaroth@liv.ac.uk
public.lanl.gov/rsanderson / gondolin.hist.liv.ac.uk/~azaroth
rds23@student.canterbury.ac.nz / azaroth@es-net.co.nz
Thank You!
December 1st, 2014
Rob Sanderson / azaroth@stanford.edu / @azaroth42
web.stanford.edu/~azaroth/#me
azaroth42@gmail.com / +azaroth42
orcid: 0000-0003-4441-6852
http://www.informatik.uni-trier.de/~ley/pers/hd/s/Sanderson:Robert
http://academic.research.microsoft.com/Author/2765999
http://www.scopus.com/authid/detail.url?authorId=8988953600
www.researchgate.net/profile/Rob_Sanderson
facebook.com/rob.sanderson / linkedin.com/pub/robert-sanderson/1/172/5a6/
rsanderson@lanl.gov / azaroth@liv.ac.uk
public.lanl.gov/rsanderson / gondolin.hist.liv.ac.uk/~azaroth
rds23@student.canterbury.ac.nz / azaroth@es-net.co.nz
Thank You!
December 1st, 2014
azaroth@stanford.edu