The Empirical Turn in Knowledge Representation
-
Upload
frank-van-harmelen -
Category
Science
-
view
602 -
download
0
Transcript of The Empirical Turn in Knowledge Representation
![Page 1: The Empirical Turn in Knowledge Representation](https://reader033.fdocuments.us/reader033/viewer/2022052308/5a64c5d57f8b9a900f8b48e1/html5/thumbnails/1.jpg)
Creative Commons CC BY 3.0:
allowed to share & remix
(also commercial)
but must attribute
Frank van Harmelen
The empirical turn in
Knowledge Representation
Contributions from many peoplein the KR&R group over many years.
And thanks to NWO for a 750k€ TOP grant for this
![Page 2: The Empirical Turn in Knowledge Representation](https://reader033.fdocuments.us/reader033/viewer/2022052308/5a64c5d57f8b9a900f8b48e1/html5/thumbnails/2.jpg)
KR in the pre-empirical era
![Page 3: The Empirical Turn in Knowledge Representation](https://reader033.fdocuments.us/reader033/viewer/2022052308/5a64c5d57f8b9a900f8b48e1/html5/thumbnails/3.jpg)
Handbook of Knowledge Representation(1000 pages, ToC alone is 14 pages)
• propositional logic & satisfiability solvers
• first order logic & resolution
• description logic
• constraint (logic) programming
• nonmonotonic reasoning
• belief revision
• qualitative reasoning
• model-based diagnosis
• bayesian networks
• temporal logic
• spatial reasoning
• epistemic logic
• deontic logic
• situation calculus
• default logic
• event calculus• ……
![Page 4: The Empirical Turn in Knowledge Representation](https://reader033.fdocuments.us/reader033/viewer/2022052308/5a64c5d57f8b9a900f8b48e1/html5/thumbnails/4.jpg)
KR metrics in the pre-empirical era
KR = logic• Show small examples
• Prove properties(expressivity, complexity)
• Give algorithms(sound, complete)
KR = engineering• Build applications
• Show high performance
• Show low engineering costs
![Page 5: The Empirical Turn in Knowledge Representation](https://reader033.fdocuments.us/reader033/viewer/2022052308/5a64c5d57f8b9a900f8b48e1/html5/thumbnails/5.jpg)
BUT AN EXPERIMENTIN THE PAST 10 YEARS
MADE IT POSSIBLE TO DO SOMETHING VERY DIFFERENT:
OBSERVE HOWKNOWLEDGE REPRESENTATIONS BEHAVE
AT VERY LARGE SCALE
![Page 6: The Empirical Turn in Knowledge Representation](https://reader033.fdocuments.us/reader033/viewer/2022052308/5a64c5d57f8b9a900f8b48e1/html5/thumbnails/6.jpg)
![Page 7: The Empirical Turn in Knowledge Representation](https://reader033.fdocuments.us/reader033/viewer/2022052308/5a64c5d57f8b9a900f8b48e1/html5/thumbnails/7.jpg)
Rest of the talk
• Which KR’s were part of the experiment?
• How much of it was there to observe?
• How did we manage to observe it?
• What did we learn from observing it?
![Page 8: The Empirical Turn in Knowledge Representation](https://reader033.fdocuments.us/reader033/viewer/2022052308/5a64c5d57f8b9a900f8b48e1/html5/thumbnails/8.jpg)
Which KR’s ?
![Page 9: The Empirical Turn in Knowledge Representation](https://reader033.fdocuments.us/reader033/viewer/2022052308/5a64c5d57f8b9a900f8b48e1/html5/thumbnails/9.jpg)
RDF (for non-logicians)
![Page 10: The Empirical Turn in Knowledge Representation](https://reader033.fdocuments.us/reader033/viewer/2022052308/5a64c5d57f8b9a900f8b48e1/html5/thumbnails/10.jpg)
RDF (for logicians)
• ground binary predicate: 𝑃(𝑂1, 𝑂2)
• Limited existential variables: ∃𝑥: 𝑃 𝐶1, 𝑥 ∧ 𝑃 𝐶2, 𝑥
• Type is unary predicate: 𝑇𝑖 𝑥
• Subtypes ∀𝑥: 𝑇1 𝑥 → 𝑇2(𝑥)
• Type restrictions ∀𝑥, 𝑦: 𝑃 𝑥, 𝑦 → 𝑇1 𝑥 ∧ 𝑇2(𝑦)
• Equality: 𝑂1= 𝑂2• Extensions to DL:
– Distjointness of types
– Cardinality restrictions (0,1)
– always decidable: sub-FOL.
![Page 11: The Empirical Turn in Knowledge Representation](https://reader033.fdocuments.us/reader033/viewer/2022052308/5a64c5d57f8b9a900f8b48e1/html5/thumbnails/11.jpg)
RDF deduction
![Page 12: The Empirical Turn in Knowledge Representation](https://reader033.fdocuments.us/reader033/viewer/2022052308/5a64c5d57f8b9a900f8b48e1/html5/thumbnails/12.jpg)
OWL Semantics
![Page 13: The Empirical Turn in Knowledge Representation](https://reader033.fdocuments.us/reader033/viewer/2022052308/5a64c5d57f8b9a900f8b48e1/html5/thumbnails/13.jpg)
How much is there to observe?
![Page 14: The Empirical Turn in Knowledge Representation](https://reader033.fdocuments.us/reader033/viewer/2022052308/5a64c5d57f8b9a900f8b48e1/html5/thumbnails/14.jpg)
± 45-100 billion facts
![Page 15: The Empirical Turn in Knowledge Representation](https://reader033.fdocuments.us/reader033/viewer/2022052308/5a64c5d57f8b9a900f8b48e1/html5/thumbnails/15.jpg)
1 fact
How big is 100 billion
![Page 16: The Empirical Turn in Knowledge Representation](https://reader033.fdocuments.us/reader033/viewer/2022052308/5a64c5d57f8b9a900f8b48e1/html5/thumbnails/16.jpg)
Denny Vrandečić – AIFB, Universität Karlsruhe ≈ 1 fact per web-page
100 billion golfballs ≈ Jupiter
![Page 17: The Empirical Turn in Knowledge Representation](https://reader033.fdocuments.us/reader033/viewer/2022052308/5a64c5d57f8b9a900f8b48e1/html5/thumbnails/17.jpg)
x T
[<x> IsOfType <T>]
differentowners & locations
< analgesic >
BTW: How did it get so big?
On the Web, anybody can say anything about anything
![Page 18: The Empirical Turn in Knowledge Representation](https://reader033.fdocuments.us/reader033/viewer/2022052308/5a64c5d57f8b9a900f8b48e1/html5/thumbnails/18.jpg)
BTW: How did it get so big?
On the Web, anybody can say anything about anything
x T
R
![Page 19: The Empirical Turn in Knowledge Representation](https://reader033.fdocuments.us/reader033/viewer/2022052308/5a64c5d57f8b9a900f8b48e1/html5/thumbnails/19.jpg)
How did you manage to observe it?
![Page 20: The Empirical Turn in Knowledge Representation](https://reader033.fdocuments.us/reader033/viewer/2022052308/5a64c5d57f8b9a900f8b48e1/html5/thumbnails/20.jpg)
![Page 21: The Empirical Turn in Knowledge Representation](https://reader033.fdocuments.us/reader033/viewer/2022052308/5a64c5d57f8b9a900f8b48e1/html5/thumbnails/21.jpg)
![Page 22: The Empirical Turn in Knowledge Representation](https://reader033.fdocuments.us/reader033/viewer/2022052308/5a64c5d57f8b9a900f8b48e1/html5/thumbnails/22.jpg)
LOD LaundromatBeek & Rietveld et al. 2014, LOD laundromat: a uniform way of publishing other people's dirty datahttp://lodlaundromat.org/pdf/lodlaundry.pdf
HDTFernández & Martínez-Prieto & Gutiérrez, 2013, Binary RDF representation for publication and exchange (HDT)
LDFVerborgh & Vander Sande et al. 2014, Web-Scale Querying through Linked Data Fragments
![Page 24: The Empirical Turn in Knowledge Representation](https://reader033.fdocuments.us/reader033/viewer/2022052308/5a64c5d57f8b9a900f8b48e1/html5/thumbnails/24.jpg)
Surprisingly efficient
1 file
28,362,198,927 unique triples
>650K data documents
524 GB of disk space
16 GB of RAM
Only €305,- hardware cost
Meta-Data for a lot of LODhttp://www.semantic-web-journal.net/content/meta-data-lot-lod-2
![Page 25: The Empirical Turn in Knowledge Representation](https://reader033.fdocuments.us/reader033/viewer/2022052308/5a64c5d57f8b9a900f8b48e1/html5/thumbnails/25.jpg)
Statistics (boring)
triples 28,362,198,927
subject 3,214,347,198
predicates 1,168,932
objects 3,178,409,386
literals 5.3B
![Page 26: The Empirical Turn in Knowledge Representation](https://reader033.fdocuments.us/reader033/viewer/2022052308/5a64c5d57f8b9a900f8b48e1/html5/thumbnails/26.jpg)
Re-use is fairly high… or not…
![Page 27: The Empirical Turn in Knowledge Representation](https://reader033.fdocuments.us/reader033/viewer/2022052308/5a64c5d57f8b9a900f8b48e1/html5/thumbnails/27.jpg)
Analysing Logical identity
Joe Raad Wouter BeekESWC2018, under submission
![Page 28: The Empirical Turn in Knowledge Representation](https://reader033.fdocuments.us/reader033/viewer/2022052308/5a64c5d57f8b9a900f8b48e1/html5/thumbnails/28.jpg)
Identity clusters
LOD-a-lot Filehttp: //lod-a-lot.lod.labs.vu.nl
[Fernández 2017]
558 millions owl:sameAs (309 millions distinct terms)
≈ 4 hours
1. Extracting all owl:sameAs statements on the LOD
HDT File(4.5 GB)
![Page 29: The Empirical Turn in Knowledge Representation](https://reader033.fdocuments.us/reader033/viewer/2022052308/5a64c5d57f8b9a900f8b48e1/html5/thumbnails/29.jpg)
HDT File(4.5 GB)
IdentityClosure
1
IdentityClosure
2
IdentityClosure
89 387 082…
- The largest Identity Closure contains 177 794 terms(contains all the countries in the world, Albert Enstein, « empty string », etc.)
- The smallest Identity Closure contains 2 terms
x owl:sameAs y z owl:sameAs y
Identity Closure x y z
2. Generating the Identity Closure
![Page 30: The Empirical Turn in Knowledge Representation](https://reader033.fdocuments.us/reader033/viewer/2022052308/5a64c5d57f8b9a900f8b48e1/html5/thumbnails/30.jpg)
![Page 31: The Empirical Turn in Knowledge Representation](https://reader033.fdocuments.us/reader033/viewer/2022052308/5a64c5d57f8b9a900f8b48e1/html5/thumbnails/31.jpg)
Identity Closure « Cities »
3. Detecting Communities (using the Louvain Algorithm)
This network (i.e. identity closure) has a community structure, as it can be grouped into different sets of nodes, with each set of nodes being densely connected internally.
Goal: Find (and later Evaluate) the most “suspicious” identity links (i.e. the links between different communities)
![Page 32: The Empirical Turn in Knowledge Representation](https://reader033.fdocuments.us/reader033/viewer/2022052308/5a64c5d57f8b9a900f8b48e1/html5/thumbnails/32.jpg)
4. Application: debugging identity statements
Identity closure containing the term
“dbpedia.org/page/Barack_Obama”
This Identity Closure contains 388 terms (i.e. 387 distinct terms are owl:sameAs this term)
95 communities detectedlargest community = 99 terms
![Page 33: The Empirical Turn in Knowledge Representation](https://reader033.fdocuments.us/reader033/viewer/2022052308/5a64c5d57f8b9a900f8b48e1/html5/thumbnails/33.jpg)
4. Application: debugging identity statements
comm0
comm3
2 links
Community 0
1. dbpedia.org/resource/B_hussein_obama2. dbpedia.org/resource/Barack_H_Obama,_Jr3. dbpedia.org/resource/Barak_hussein_obama4. dbpedia.org/resource/President_Barack5. dbpedia.org/resource/Senator_Barack_Obama6. dbpedia.org/resource/Obama
…
99. dbpedia.org/resource/Hussein_Obama
Community 3
1. dbpedia.org/resource/Presidency_of_Barack_Obama2. dbpedia.org/resource/Barack_Obama_Administration3. dbpedia.org/resource/Barack_Obama_Cabinet4. dbpedia.org/resource/Obama_White_House5. dbpedia.org/resource/Obama_regime6. dbpedia.org/resource/America_under_Obama
…
52. dbpedia.org/resource/Presidential_transition_of_Barack_Obama
![Page 34: The Empirical Turn in Knowledge Representation](https://reader033.fdocuments.us/reader033/viewer/2022052308/5a64c5d57f8b9a900f8b48e1/html5/thumbnails/34.jpg)
Symbols or words?
Steven de Rooij Peter Bloem Wouter Beek (ISWC 2016)http://www.cs.vu.nl/~frankh/postscript/ISWC2016.pdf
![Page 35: The Empirical Turn in Knowledge Representation](https://reader033.fdocuments.us/reader033/viewer/2022052308/5a64c5d57f8b9a900f8b48e1/html5/thumbnails/35.jpg)
Symbols or words?
Symbol names are supposed to be meaningless
Aspirin headache
analgesic pain
symptomdrug
treats
treats
![Page 36: The Empirical Turn in Knowledge Representation](https://reader033.fdocuments.us/reader033/viewer/2022052308/5a64c5d57f8b9a900f8b48e1/html5/thumbnails/36.jpg)
Measure mutual information content between string and semantics of a symbol
E(x) = efficient encoding of x
Mutual information content
M(x,y) =E(x) + E(y) – E(x,y)
Take x = symbol name of x as a string
Take 𝑦1 = {types of x} ≈ semantics of x
Take 𝑦2 = {properties of x} ≈ semantics of x
Calculate M(x, 𝑦1) and M(x, 𝑦2) for all symbols in 600k datasets
![Page 37: The Empirical Turn in Knowledge Representation](https://reader033.fdocuments.us/reader033/viewer/2022052308/5a64c5d57f8b9a900f8b48e1/html5/thumbnails/37.jpg)
But variables do encode meaning!
Fraction of datasets with redundancy for types/predicatesat significance level > 0.99
BTW, this is 600.000 datapoints (RDF docs)
![Page 38: The Empirical Turn in Knowledge Representation](https://reader033.fdocuments.us/reader033/viewer/2022052308/5a64c5d57f8b9a900f8b48e1/html5/thumbnails/38.jpg)
Very different network structures
for different predicates
Tobias Kuhn Wouter Beekhttp://ceur-ws.org/Vol-1946/paper-05.pdf
![Page 39: The Empirical Turn in Knowledge Representation](https://reader033.fdocuments.us/reader033/viewer/2022052308/5a64c5d57f8b9a900f8b48e1/html5/thumbnails/39.jpg)
skos:exactMatch
![Page 40: The Empirical Turn in Knowledge Representation](https://reader033.fdocuments.us/reader033/viewer/2022052308/5a64c5d57f8b9a900f8b48e1/html5/thumbnails/40.jpg)
foaf:knows
![Page 41: The Empirical Turn in Knowledge Representation](https://reader033.fdocuments.us/reader033/viewer/2022052308/5a64c5d57f8b9a900f8b48e1/html5/thumbnails/41.jpg)
osspr:contains
![Page 42: The Empirical Turn in Knowledge Representation](https://reader033.fdocuments.us/reader033/viewer/2022052308/5a64c5d57f8b9a900f8b48e1/html5/thumbnails/42.jpg)
Geopolitics:hasborderWith
![Page 43: The Empirical Turn in Knowledge Representation](https://reader033.fdocuments.us/reader033/viewer/2022052308/5a64c5d57f8b9a900f8b48e1/html5/thumbnails/43.jpg)
Summary &
So what…
![Page 44: The Empirical Turn in Knowledge Representation](https://reader033.fdocuments.us/reader033/viewer/2022052308/5a64c5d57f8b9a900f8b48e1/html5/thumbnails/44.jpg)
• We now have larger KB’s than ever before
• We now have the instruments to observe and analyse these very large KB’s
• We can use these insights for better tools:
– query & inference
– publish & maintain
– visualise & explain
– …
![Page 45: The Empirical Turn in Knowledge Representation](https://reader033.fdocuments.us/reader033/viewer/2022052308/5a64c5d57f8b9a900f8b48e1/html5/thumbnails/45.jpg)
But my secret hope is that this will help us to understand the patterns of knowledge:
AI as a computational theory of knowledge