The Linked Data Snowball and Why We Need Reconciliation · 2014-12-05 · The Linked Data Snowball...

Post on 26-Jun-2020

0 views 0 download

Transcript of The Linked Data Snowball and Why We Need Reconciliation · 2014-12-05 · The Linked Data Snowball...

The Linked Data Snowball andWhy We Need Reconciliation

December 1st, 2014

T H E A N D R E W W. M E L L O N F O U N D AT I O N

W O R K S H O P O N R E C O N C I L I AT I O N O F L I N K E D O P E N D ATA

Rob Sanderson / azaroth@stanford.edu / @azaroth42

The Linked Data Snowball andWhy We Need Reconciliation

December 1st, 2014

T H E A N D R E W W. M E L L O N F O U N D AT I O N

W O R K S H O P O N R E C O N C I L I AT I O N O F L I N K E D O P E N D ATA

Rob Sanderson / azaroth@stanford.edu / @azaroth42

web.stanford.edu/~azaroth/#me

azaroth42@gmail.com / +azaroth42

orcid: 0000-0003-4441-6852

The Linked Data Snowball andWhy We Need Reconciliation

December 1st, 2014

T H E A N D R E W W. M E L L O N F O U N D AT I O NW O R K S H O P O N R E C O N C I L I AT I O N O F L I N K E D O P E N D ATA

Rob Sanderson / azaroth@stanford.edu / @azaroth42

web.stanford.edu/~azaroth/#me

azaroth42@gmail.com / +azaroth42

orcid: 0000-0003-4441-6852

http://www.informatik.uni-trier.de/~ley/pers/hd/s/Sanderson:Robert

http://academic.research.microsoft.com/Author/2765999

http://www.scopus.com/authid/detail.url?authorId=8988953600

www.researchgate.net/profile/Rob_Sanderson

facebook.com/rob.sanderson / linkedin.com/pub/robert-sanderson/1/172/5a6/

rsanderson@lanl.gov / azaroth@liv.ac.uk

public.lanl.gov/rsanderson / gondolin.hist.liv.ac.uk/~azaroth

rds23@student.canterbury.ac.nz / azaroth@es-net.co.nz

A Brief Survey of Linked Open Data

http://lod-cloud.net/ as of Aug 2014

Some Highlights

Libraries:

BNF, DNB, BL, LoC, KB, ...

Archives:

SNAC, LOCAH, Medici Archives, ...

Museums:

BM, YCBA, vu.nl, Smithsonian, Getty, AAC, ...

Consortia:

Europeana (+), TEL, RLUK, DPLA, ...

Government:

data.gov, data.gov.uk, legislation.gov.uk, ...

Companies:

OCLC, Google, IBM, New York Times, ...

Lots of Adoption = Lots of URIs

Lots of Adoption = Lots of URIs

For the Same Thing :(

Why So Many?

Do I know the URI, or can I find it?

URI

No

Why So Many?

Do I know the URI, or can I find it? No

Understand and agree with the model used?No

URI

Why So Many?

Do I know the URI, or can I find it? No

Understand and agree with the model used?No

Understand and agree with the description?No

URI

Why So Many?

Do I know the URI, or can I find it? No

Understand and agree with the model used?No

Understand and agree with the description?No

Agree the URI identifies the same entity?No

URI

Why So Many?

Do I know the URI, or can I find it? No

Understand and agree with the model used?No

Understand and agree with the description?No

Agree the URI identifies the same entity?No

Agree description is complete?No

URI

Why So Many?

Do I know the URI, or can I find it? No

Understand and agree with the model used?No

Understand and agree with the description?No

Agree the URI identifies the same entity?No

Agree description is complete?No

Hooray, you reused a URI! URIYes

Why So Many?

Do I know the URI, or can I find it? No

Understand and agree with the model used?No

Understand and agree with the description?No

Agree the URI identifies the same entity?No

Agree description is complete?No

Hooray, you reused a URI!Now start again with the next one :(

URIYes

Many Special and Unique Snowflakes

Become a Huge Technical Debt Snowball

Option 1: Balance the Equation

Cost(Create URI)

+

Cost(Maintain URI)

Cost(Find Good URI) +

Cost(Understand Model) +

Cost(Understand Content)

+

Cost(Network Latency)

+

min( Risk(Reliability),

Cost(Cache Content) )

-

Value(Linking Graph)

<=

Option 1 Likelihood

Option 2: Reconciliation of URIs

Stanford's URIs British Library's URIs

Option 2: Reconciliation of URIs

Stanford's

Entities

British

Library's

Entities

Shared Entities without Shared URIs

Option 2: Reconciliation Process

Discover this intersection given the descriptions of the entities

Option 2: Reconciliation Process

Best sort of engineering problem:

• Easy to explain

• Helps many organizations at once

• Provides significant value and utility

Option 2: Reconciliation Process

Best sort of engineering problem:

• Easy to explain

• Helps many organizations at once

• Provides significant value and utility

• Difficult to solve

Option 2: Reconciliation Process

Best sort of engineering problem:

• Easy to explain

• Helps many organizations at once

• Provides significant value and utility

• Difficult to solve

But:

• Requires community adoption of the results

Current Community

Expectation Management is Important

Or at best:

Top Three Questions to Answer(according to Rob)

Which sorts of entities should this community reconcile?

How can we share the engineering internationally?

How do we ensure future usage of the reconciled entities?

Thoughts: Entities to Reconcile

Start with least controversial and most unique

• Unique physical objects

• People

• Places

Must generate consensus around identity within the LAM

community.

Must focus on unique selling points – how can we be more

useful than DBPedia for our own entities?

Thoughts: Shared Engineering

Let a thousand snowflakes fall ...

... then build the best snowball possible.

• Solve small, manageable problems well

• Interoperability between platforms: plug and play

• Communicate continuously

Focused projects that fit into a whole, leveraging the experts in

the appropriate domain.

Requires some degree of community structure and

management to ensure we're building off each other.

Thoughts: Ensure Usage

Build consensus early and often

• Between institutions

• Within the LAM community

• Outside the LAM community

Who should be here that isn't? Lots of libraries, also need

input from Museums and Archives as they have more unique

entities.

We need to ensure that LAM use the reconciled entities,

which involves starting to balance the cost equation.

Thank You!

December 1st, 2014

Rob Sanderson / azaroth@stanford.edu / @azaroth42

web.stanford.edu/~azaroth/#me

azaroth42@gmail.com / +azaroth42

orcid: 0000-0003-4441-6852

http://www.informatik.uni-trier.de/~ley/pers/hd/s/Sanderson:Robert

http://academic.research.microsoft.com/Author/2765999

http://www.scopus.com/authid/detail.url?authorId=8988953600

www.researchgate.net/profile/Rob_Sanderson

facebook.com/rob.sanderson / linkedin.com/pub/robert-sanderson/1/172/5a6/

rsanderson@lanl.gov / azaroth@liv.ac.uk

public.lanl.gov/rsanderson / gondolin.hist.liv.ac.uk/~azaroth

rds23@student.canterbury.ac.nz / azaroth@es-net.co.nz

Thank You!

December 1st, 2014

Rob Sanderson / azaroth@stanford.edu / @azaroth42

web.stanford.edu/~azaroth/#me

azaroth42@gmail.com / +azaroth42

orcid: 0000-0003-4441-6852

http://www.informatik.uni-trier.de/~ley/pers/hd/s/Sanderson:Robert

http://academic.research.microsoft.com/Author/2765999

http://www.scopus.com/authid/detail.url?authorId=8988953600

www.researchgate.net/profile/Rob_Sanderson

facebook.com/rob.sanderson / linkedin.com/pub/robert-sanderson/1/172/5a6/

rsanderson@lanl.gov / azaroth@liv.ac.uk

public.lanl.gov/rsanderson / gondolin.hist.liv.ac.uk/~azaroth

rds23@student.canterbury.ac.nz / azaroth@es-net.co.nz

Thank You!

December 1st, 2014

azaroth@stanford.edu