Skolemising Blank Nodes while Preserving Isomorphism

83
Skolemising Blank Nodes while Preserving Isomorphism Aidan Hogan – DCC, Universidad de Chile

Transcript of Skolemising Blank Nodes while Preserving Isomorphism

Page 1: Skolemising Blank Nodes while Preserving Isomorphism

Skolemising Blank Nodes whilePreserving Isomorphism

Aidan Hogan – DCC, Universidad de Chile

Page 2: Skolemising Blank Nodes while Preserving Isomorphism

WHY? BLANK NODES ARE GREAT!

Page 3: Skolemising Blank Nodes while Preserving Isomorphism

When life gives you blank nodes …

Page 4: Skolemising Blank Nodes while Preserving Isomorphism

Blank Nodes are glue!

Page 5: Skolemising Blank Nodes while Preserving Isomorphism

Blank Nodes names aren’t important …

(Isomorphic)

Page 6: Skolemising Blank Nodes while Preserving Isomorphism

Blank nodes are common in real-world data …

Aidan Hogan, Marcelo Arenas, Alejandro Mallea and Axel Polleres "Everything You Always Wanted to Know About Blank Nodes". Journal of Web Semantics 27: pp. 42–69, 2014

Page 7: Skolemising Blank Nodes while Preserving Isomorphism

BLANK NODES ENABLE SYNTAX SHORTCUTSThey represent implicit nodes in the graphThey help specify order, higher-arity relations, reification, etc., succinctlyThey are common in real-world data

Page 8: Skolemising Blank Nodes while Preserving Isomorphism

BLANK NODES:WHAT’S THE PROBLEM?

Page 9: Skolemising Blank Nodes while Preserving Isomorphism

Are two RDF graphs isomorphic?

Page 10: Skolemising Blank Nodes while Preserving Isomorphism

Are two RDF graphs isomorphic?

Page 11: Skolemising Blank Nodes while Preserving Isomorphism

RDF ISOMORPHISM IS GI-COMPLETEA general algorithm to see if two RDF graphs are the “same” will (probably) not be tractable

Page 12: Skolemising Blank Nodes while Preserving Isomorphism

BLANK NODES ADD COMPLEXITY?WHAT TO DO?

Page 13: Skolemising Blank Nodes while Preserving Isomorphism

RDF 1.1 proposes Skolemisation

Page 14: Skolemising Blank Nodes while Preserving Isomorphism

But fresh IRIs every time is not ideal

Page 15: Skolemising Blank Nodes while Preserving Isomorphism

But fresh IRIs every time is not ideal

Page 16: Skolemising Blank Nodes while Preserving Isomorphism

Would prefer a “consistent” labelling

Page 17: Skolemising Blank Nodes while Preserving Isomorphism

Would prefer a “consistent” labelling

Page 18: Skolemising Blank Nodes while Preserving Isomorphism

Compute isomorphically-unique graph hash

Page 19: Skolemising Blank Nodes while Preserving Isomorphism

Finding duplicate documents from a crawler

Page 20: Skolemising Blank Nodes while Preserving Isomorphism

CANONICAL LABELLING USEFUL FOR:1. Mapping blank nodes to IRIs 2. Computing unique hashes for RDF graphs

Page 21: Skolemising Blank Nodes while Preserving Isomorphism

OLD BUT RECURRING QUESTION

Page 22: Skolemising Blank Nodes while Preserving Isomorphism

An old question that won’t go away …

Jeremy J. Carroll. “Signing RDF Graphs.” ISWC 2003.

Edzard Höfig, Ina Schieferdecker. “Hashing of RDF Graphs and a Solution to the Blank Node Problem.” URSW 2014.

Page 23: Skolemising Blank Nodes while Preserving Isomorphism

NO EXISTING APPROACH IS GENERAL• Hard cases seem unlikely in practice• Let’s build a general (and thus worst-case exponential) algorithm

that’s efficient for practical cases

Page 24: Skolemising Blank Nodes while Preserving Isomorphism

NAÏVE CANONICAL LABELLING SCHEME

Page 25: Skolemising Blank Nodes while Preserving Isomorphism

(Naïve) Canonical labels for blank nodes

Page 26: Skolemising Blank Nodes while Preserving Isomorphism

But wait … what happens if ... ?

Page 27: Skolemising Blank Nodes while Preserving Isomorphism

Or another case …

Page 28: Skolemising Blank Nodes while Preserving Isomorphism

Or another case …

Page 29: Skolemising Blank Nodes while Preserving Isomorphism

Or another case …

Page 30: Skolemising Blank Nodes while Preserving Isomorphism

Fixpoint does not distinguish all blank nodes!

Page 31: Skolemising Blank Nodes while Preserving Isomorphism

NAÏVE: COLOUR BLANK NODES RECURSIVELY UNTIL FIXPOINT• Efficient• Incomplete

Page 32: Skolemising Blank Nodes while Preserving Isomorphism

CANONICAL LABELLING SCHEME:ALWAYS DISTINGUISH ALL BLANK NODES

Brendan D. McKay. "Practical graph isomorphism". Congressus Numerantium 30: pp. 45–87, 1981.

Page 33: Skolemising Blank Nodes while Preserving Isomorphism

Start with a (non-distinguished) colouring …

Page 34: Skolemising Blank Nodes while Preserving Isomorphism

Let’s distinguish a node …

Page 35: Skolemising Blank Nodes while Preserving Isomorphism

Let’s distinguish a node …

Page 36: Skolemising Blank Nodes while Preserving Isomorphism

Colouring is no longer a fixpoint!

Page 37: Skolemising Blank Nodes while Preserving Isomorphism

Rerun colouring to fixpoint

Page 38: Skolemising Blank Nodes while Preserving Isomorphism

Rerun colouring to fixpoint

Page 39: Skolemising Blank Nodes while Preserving Isomorphism

Rerun colouring to fixpoint

Page 40: Skolemising Blank Nodes while Preserving Isomorphism

Rerun colouring to fixpoint

Page 41: Skolemising Blank Nodes while Preserving Isomorphism

Fixpoint reached: still not finished!

Page 42: Skolemising Blank Nodes while Preserving Isomorphism

So again let’s distinguish another …

Page 43: Skolemising Blank Nodes while Preserving Isomorphism

… and rerun colouring to fixpoint

Page 44: Skolemising Blank Nodes while Preserving Isomorphism

… and rerun colouring to fixpoint

Page 45: Skolemising Blank Nodes while Preserving Isomorphism

… and rerun colouring to fixpoint

Page 46: Skolemising Blank Nodes while Preserving Isomorphism

… and rerun colouring to fixpoint

Page 47: Skolemising Blank Nodes while Preserving Isomorphism

… and rerun colouring to fixpoint

Page 48: Skolemising Blank Nodes while Preserving Isomorphism

… and rerun colouring to fixpoint

Page 49: Skolemising Blank Nodes while Preserving Isomorphism

Now all blank nodes are distinguished!

Page 50: Skolemising Blank Nodes while Preserving Isomorphism

Blank node labels computed from colour

Page 51: Skolemising Blank Nodes while Preserving Isomorphism

Let’s go back: first, why pick _:a and _:c?

Page 52: Skolemising Blank Nodes while Preserving Isomorphism

Okay so: why _:a …

Page 53: Skolemising Blank Nodes while Preserving Isomorphism

Adapt ideas from the Nauty algorithm (for standard graph isomorphism)

Page 54: Skolemising Blank Nodes while Preserving Isomorphism

Adapt ideas from the Nauty algorithm (for standard graph isomorphism)

Page 55: Skolemising Blank Nodes while Preserving Isomorphism

Check all leafs for minimum graph

Page 56: Skolemising Blank Nodes while Preserving Isomorphism

What happened?

Page 57: Skolemising Blank Nodes while Preserving Isomorphism

What happened?

Page 58: Skolemising Blank Nodes while Preserving Isomorphism

What happened?

Page 59: Skolemising Blank Nodes while Preserving Isomorphism

Automorphisms cause repetitions

Page 60: Skolemising Blank Nodes while Preserving Isomorphism

CORE ALGORITHM: FIND MINIMAL GRAPH FOLLOWING FIXED COLOURING RULES• Complete• Efficient for many cases?

Page 61: Skolemising Blank Nodes while Preserving Isomorphism

OKAY … SO WHAT HASHING TO USE?

Page 62: Skolemising Blank Nodes while Preserving Isomorphism

What about hash collisions?

128 bit: MD5, Murmur3_128160 bit: SHA1

Page 63: Skolemising Blank Nodes while Preserving Isomorphism

HASHING MAY LEAD TO COLLISIONS• Don’t care what hashing you want to use• 128-bit hash shortest hash with acceptable collision probability• For cryptographic use-cases, SHA-256 or better might be needed

Page 64: Skolemising Blank Nodes while Preserving Isomorphism

EVALUATION

Page 65: Skolemising Blank Nodes while Preserving Isomorphism

Evaluation: Real-world Graphs

Page 66: Skolemising Blank Nodes while Preserving Isomorphism

Evaluation: Nasty Synthetic Graphs

Page 67: Skolemising Blank Nodes while Preserving Isomorphism

CONCLUSIONS

Page 68: Skolemising Blank Nodes while Preserving Isomorphism

In loving memory of

Linked Data

2007–2012

Survived by its research

community

_:b1999–2015

Page 69: Skolemising Blank Nodes while Preserving Isomorphism

Conclusions

Page 70: Skolemising Blank Nodes while Preserving Isomorphism

Aside: Why GI-Hard?

Page 71: Skolemising Blank Nodes while Preserving Isomorphism

Aside: Why GI-Hard?(Can Encode Graph Isomorphism as RDF Isomorphism)

if and only if

Page 72: Skolemising Blank Nodes while Preserving Isomorphism

Aside: Why GI-Complete?(Can we encode RDF isomorphism as graph isomorphism?)

if and only if

?

?

Page 73: Skolemising Blank Nodes while Preserving Isomorphism

Aside: Why GI-Complete?(Yes: We can encode RDF isomorphism as graph isomorphism)

Page 74: Skolemising Blank Nodes while Preserving Isomorphism

Aside: Why GI-Complete?(Yes: We can encode RDF isomorphism as graph isomorphism)

if and only if

Page 75: Skolemising Blank Nodes while Preserving Isomorphism

COMPLETE CANONICAL LABELLING SCHEME

Page 76: Skolemising Blank Nodes while Preserving Isomorphism

A complete canonical labelling?

Page 77: Skolemising Blank Nodes while Preserving Isomorphism

Find a canonical labelling for H

Page 78: Skolemising Blank Nodes while Preserving Isomorphism

Choose the lowest possible graph

Page 79: Skolemising Blank Nodes while Preserving Isomorphism

COMPLETE: FIND MINIMUM POSSIBLE GRAPH USING FIXED BLANK NODE LABELS• Complete• Inefficient

Page 80: Skolemising Blank Nodes while Preserving Isomorphism

The need for a graph-level hash

Page 81: Skolemising Blank Nodes while Preserving Isomorphism

OPTIMISATION: PRUNE THE TREE USING AUTOMORPHISMS

Page 82: Skolemising Blank Nodes while Preserving Isomorphism

Trim the search treeusing “found” automorphisms

Found Automorphisms …

Page 83: Skolemising Blank Nodes while Preserving Isomorphism

PRUNING PER AUTOMORPHISMS AVOIDS SYMMETRIC REPETITIONS• Automorphisms are found naturally• Makes very “regular” structures (like cliques) a lot easier• Need to be careful how to manage the automorphism group