The Lazy Traveling Salesman Memory Management for Large-Scale Link Discovery
-
Upload
holistic-benchmarking-of-big-linked-data -
Category
Engineering
-
view
212 -
download
1
Transcript of The Lazy Traveling Salesman Memory Management for Large-Scale Link Discovery
![Page 1: The Lazy Traveling Salesman Memory Management for Large-Scale Link Discovery](https://reader035.fdocuments.us/reader035/viewer/2022062522/58e7b9e81a28ab65578b5847/html5/thumbnails/1.jpg)
The Lazy Traveling SalesmanMemory Management for Large-Scale Link Discovery
Axel-Cyrille Ngonga Ngomo and Mofeed Hassan
University of LeipzigInstitute for Applied Informatics
May 28th, 2016Crete, Greece
Ngonga Ngomo and Hassan (InfAI) GNOME November 10, 2016 1 / 22
![Page 2: The Lazy Traveling Salesman Memory Management for Large-Scale Link Discovery](https://reader035.fdocuments.us/reader035/viewer/2022062522/58e7b9e81a28ab65578b5847/html5/thumbnails/2.jpg)
Why Link Discovery?
1 Fourth principle2 Links are central for
Cross-ontology QAData IntegrationReasoningFederated Queries...
3 Current topology of theLOD Cloud
31+ billion triples≈ 0.5 billion linksowl:sameAs in mostcases
Ngonga Ngomo and Hassan (InfAI) GNOME November 10, 2016 2 / 22
![Page 3: The Lazy Traveling Salesman Memory Management for Large-Scale Link Discovery](https://reader035.fdocuments.us/reader035/viewer/2022062522/58e7b9e81a28ab65578b5847/html5/thumbnails/3.jpg)
Why is it difficult?
Definition (Link Discovery)Given sets S and T of resources and relation RFind M = {(s, t) ∈ S × T : R(s, t)}Common approach: Find M ′ = {(s, t) ∈ S × T : σ(s, t) ≥ θ}
Example: R = :sameModel
:s770fm rdfs:label "S770FM"@en:s770fm rdf:type :SABER:s770fm :model :770:s770fm :top :FlamedMaple:s770fm :producer :Ibanez
:s770fm rdfs:label "S770BEM"@en:s770fm rdf:type :SABER:s770fm :model :770:s770fm :top :BirdEyeMaple:s770fm :producer :Ibanez
Ngonga Ngomo and Hassan (InfAI) GNOME November 10, 2016 3 / 22
![Page 4: The Lazy Traveling Salesman Memory Management for Large-Scale Link Discovery](https://reader035.fdocuments.us/reader035/viewer/2022062522/58e7b9e81a28ab65578b5847/html5/thumbnails/4.jpg)
Why is it difficult?
Definition (Link Discovery)Given sets S and T of resources and relation RFind M = {(s, t) ∈ S × T : R(s, t)}Common approach: Find M ′ = {(s, t) ∈ S × T : σ(s, t) ≥ θ}
Example: R = :sameModel
:s770fm rdfs:label "S770FM"@en:s770fm rdf:type :SABER:s770fm :model :770:s770fm :top :FlamedMaple:s770fm :producer :Ibanez
:s770fm rdfs:label "S770BEM"@en:s770fm rdf:type :SABER:s770fm :model :770:s770fm :top :BirdEyeMaple:s770fm :producer :Ibanez
Ngonga Ngomo and Hassan (InfAI) GNOME November 10, 2016 3 / 22
![Page 5: The Lazy Traveling Salesman Memory Management for Large-Scale Link Discovery](https://reader035.fdocuments.us/reader035/viewer/2022062522/58e7b9e81a28ab65578b5847/html5/thumbnails/5.jpg)
Why is it difficult?
Definition (Link Discovery)Given sets S and T of resources and relation RFind M = {(s, t) ∈ S × T : R(s, t)}Common approach: Find M ′ = {(s, t) ∈ S × T : σ(s, t) ≥ θ}
Example: R = :sameModel
:s770fm rdfs:label "S770FM"@en:s770fm rdf:type :SABER:s770fm :model :770:s770fm :top :FlamedMaple:s770fm :producer :Ibanez
:s770fm rdfs:label "S770BEM"@en:s770fm rdf:type :SABER:s770fm :model :770:s770fm :top :BirdEyeMaple:s770fm :producer :Ibanez
Ngonga Ngomo and Hassan (InfAI) GNOME November 10, 2016 3 / 22
![Page 6: The Lazy Traveling Salesman Memory Management for Large-Scale Link Discovery](https://reader035.fdocuments.us/reader035/viewer/2022062522/58e7b9e81a28ab65578b5847/html5/thumbnails/6.jpg)
Why is it difficult?
1 Time complexityLarge number of triplesQuadratic a-priori runtime69 days for mapping cities fromDBpedia to GeonamesSolutions usually in-memoryInsufficient memory oncommodity hardware
2 Complexity of specificationsCombination of several attributesrequired for high precisionTedious discovery of mostadequate mappingDataset-dependent similarityfunctions
Ngonga Ngomo and Hassan (InfAI) GNOME November 10, 2016 4 / 22
![Page 7: The Lazy Traveling Salesman Memory Management for Large-Scale Link Discovery](https://reader035.fdocuments.us/reader035/viewer/2022062522/58e7b9e81a28ab65578b5847/html5/thumbnails/7.jpg)
Why is it difficult?
1 Time complexityLarge number of triplesQuadratic a-priori runtime69 days for mapping cities fromDBpedia to GeonamesSolutions usually in-memoryInsufficient memory oncommodity hardware
2 Complexity of specificationsCombination of several attributesrequired for high precisionTedious discovery of mostadequate mappingDataset-dependent similarityfunctions
Ngonga Ngomo and Hassan (InfAI) GNOME November 10, 2016 4 / 22
![Page 8: The Lazy Traveling Salesman Memory Management for Large-Scale Link Discovery](https://reader035.fdocuments.us/reader035/viewer/2022062522/58e7b9e81a28ab65578b5847/html5/thumbnails/8.jpg)
Problem Statement
AssumptionsConstant memory C|S|+ |T | > |C |
GoalDevise time-efficient approach to compute M ′
Ensure completeness of resultsSolution: Gnome
Ngonga Ngomo and Hassan (InfAI) GNOME November 10, 2016 5 / 22
![Page 9: The Lazy Traveling Salesman Memory Management for Large-Scale Link Discovery](https://reader035.fdocuments.us/reader035/viewer/2022062522/58e7b9e81a28ab65578b5847/html5/thumbnails/9.jpg)
Divide and Merge
Time-Efficient Link DiscoveryInsight: Most approaches rely on divide-and-merge paradigmExample: HR3
σ(s, t) ≥ θ ⇔ δ(s, t) ≤ ∆
Ngonga Ngomo and Hassan (InfAI) GNOME November 10, 2016 6 / 22
![Page 10: The Lazy Traveling Salesman Memory Management for Large-Scale Link Discovery](https://reader035.fdocuments.us/reader035/viewer/2022062522/58e7b9e81a28ab65578b5847/html5/thumbnails/10.jpg)
Divide and Merge1 Define S = {S1, . . . ,Sn} with Si ⊆ S ∧
⋃iSi = S
2 Define T = {T1, . . . ,Sm} with Tj ⊆ T ∧⋃jTj = T
3 Find mapping function µ : S → 2T withelements of each Si must only be compared with elements of sets in µ(Si )the union of the results over all Si ∈ S is exactly M′.
Ngonga Ngomo and Hassan (InfAI) GNOME November 10, 2016 7 / 22
![Page 11: The Lazy Traveling Salesman Memory Management for Large-Scale Link Discovery](https://reader035.fdocuments.us/reader035/viewer/2022062522/58e7b9e81a28ab65578b5847/html5/thumbnails/11.jpg)
Divide and Merge1 Define S = {S1, . . . ,Sn} with Si ⊆ S ∧
⋃iSi = S
2 Define T = {T1, . . . ,Sm} with Tj ⊆ T ∧⋃jTj = T
3 Find mapping function µ : S → 2T withelements of each Si must only be compared with elements of sets in µ(Si )the union of the results over all Si ∈ S is exactly M′.
Ngonga Ngomo and Hassan (InfAI) GNOME November 10, 2016 7 / 22
![Page 12: The Lazy Traveling Salesman Memory Management for Large-Scale Link Discovery](https://reader035.fdocuments.us/reader035/viewer/2022062522/58e7b9e81a28ab65578b5847/html5/thumbnails/12.jpg)
Task Graph
DefinitionA task Eij stands for comparing Si with Tj ∈ µ(Si )Task Graph G = (V ,E ,wv ,we), with
V = S ∪ Twv (v) = |V |we(eij ) = |Si ||Tj |
S1
3
T1
2S2
2
T2
1
S3
4
T3
36
3
4
2
4
12
6
Ngonga Ngomo and Hassan (InfAI) GNOME November 10, 2016 8 / 22
![Page 13: The Lazy Traveling Salesman Memory Management for Large-Scale Link Discovery](https://reader035.fdocuments.us/reader035/viewer/2022062522/58e7b9e81a28ab65578b5847/html5/thumbnails/13.jpg)
Task Graph
DefinitionA task Eij stands for comparing Si with Tj ∈ µ(Si )Task Graph G = (V ,E ,wv ,we), with
V = S ∪ Twv (v) = |V |we(eij ) = |Si ||Tj |
S1
3
T1
2S2
2
T2
1
S3
4
T3
3
6
3
4
2
4
12
6
Ngonga Ngomo and Hassan (InfAI) GNOME November 10, 2016 8 / 22
![Page 14: The Lazy Traveling Salesman Memory Management for Large-Scale Link Discovery](https://reader035.fdocuments.us/reader035/viewer/2022062522/58e7b9e81a28ab65578b5847/html5/thumbnails/14.jpg)
Task Graph
DefinitionA task Eij stands for comparing Si with Tj ∈ µ(Si )Task Graph G = (V ,E ,wv ,we), with
V = S ∪ Twv (v) = |V |we(eij ) = |Si ||Tj |
S1
3
T1
2S2
2
T2
1
S3
4
T3
36
3
4
2
4
12
6
Ngonga Ngomo and Hassan (InfAI) GNOME November 10, 2016 8 / 22
![Page 15: The Lazy Traveling Salesman Memory Management for Large-Scale Link Discovery](https://reader035.fdocuments.us/reader035/viewer/2022062522/58e7b9e81a28ab65578b5847/html5/thumbnails/15.jpg)
Task Graph
DefinitionA task Eij stands for comparing Si with Tj ∈ µ(Si )Task Graph G = (V ,E ,wv ,we), with
V = S ∪ Twv (v) = |V |we(eij ) = |Si ||Tj |
S1
3
T1
2S2
2
T2
1
S3
4
T3
36
3
4
2
4
12
6
Ngonga Ngomo and Hassan (InfAI) GNOME November 10, 2016 8 / 22
![Page 16: The Lazy Traveling Salesman Memory Management for Large-Scale Link Discovery](https://reader035.fdocuments.us/reader035/viewer/2022062522/58e7b9e81a28ab65578b5847/html5/thumbnails/16.jpg)
Problem
Locality maximizationQuestion: What if V does not fit in memory?Insight: Main bottleneck is access to hard drive.Solution:
1 Find groups of nodes that fit in memory and2 Compute sequence of groups that minimizes hard drive access
S1
3T1
2S2
2
T2
1
S3 4
T3
36
3
4
2
412
6
Ngonga Ngomo and Hassan (InfAI) GNOME November 10, 2016 9 / 22
![Page 17: The Lazy Traveling Salesman Memory Management for Large-Scale Link Discovery](https://reader035.fdocuments.us/reader035/viewer/2022062522/58e7b9e81a28ab65578b5847/html5/thumbnails/17.jpg)
Clustering: Naïve Approach
ApproachCluster by Si
Example: |C | = 7
S1
3T1
2S2
2
T2
1
S3 4
T3
36
3
4
2
412
6
6
3
4
2
6
412
Ngonga Ngomo and Hassan (InfAI) GNOME November 10, 2016 10 / 22
![Page 18: The Lazy Traveling Salesman Memory Management for Large-Scale Link Discovery](https://reader035.fdocuments.us/reader035/viewer/2022062522/58e7b9e81a28ab65578b5847/html5/thumbnails/18.jpg)
Clustering: Naïve Approach
ApproachCluster by Si
Example: |C | = 7
S1
3T1
2S2
2
T2
1
S3 4
T3
36
3
4
2
412
6
6
3
4
2
6
412
Ngonga Ngomo and Hassan (InfAI) GNOME November 10, 2016 10 / 22
![Page 19: The Lazy Traveling Salesman Memory Management for Large-Scale Link Discovery](https://reader035.fdocuments.us/reader035/viewer/2022062522/58e7b9e81a28ab65578b5847/html5/thumbnails/19.jpg)
Clustering: Naïve Approach
ApproachCluster by Si
Example: |C | = 7
S1
3T1
2S2
2
T2
1
S3 4
T3
36
3
4
2
412
6
6
3
4
2
6
412
Ngonga Ngomo and Hassan (InfAI) GNOME November 10, 2016 10 / 22
![Page 20: The Lazy Traveling Salesman Memory Management for Large-Scale Link Discovery](https://reader035.fdocuments.us/reader035/viewer/2022062522/58e7b9e81a28ab65578b5847/html5/thumbnails/20.jpg)
Clustering: Naïve Approach
ApproachCluster by Si
Example: |C | = 7
S1
3T1
2S2
2
T2
1
S3 4
T3
36
3
4
2
412
6
6
3
4
2
6
412
Ngonga Ngomo and Hassan (InfAI) GNOME November 10, 2016 10 / 22
![Page 21: The Lazy Traveling Salesman Memory Management for Large-Scale Link Discovery](https://reader035.fdocuments.us/reader035/viewer/2022062522/58e7b9e81a28ab65578b5847/html5/thumbnails/21.jpg)
Clustering: Naïve Approach
ApproachCluster by Si
Example: |C | = 7
S1
3T1
2S2
2
T2
1
S3 4
T3
36
3
4
2
412
6
6
3
4
2
6
412
Ngonga Ngomo and Hassan (InfAI) GNOME November 10, 2016 10 / 22
![Page 22: The Lazy Traveling Salesman Memory Management for Large-Scale Link Discovery](https://reader035.fdocuments.us/reader035/viewer/2022062522/58e7b9e81a28ab65578b5847/html5/thumbnails/22.jpg)
Clustering: Naïve Approach
Greedy ApproachStart by largest taskAdd connected largest tasks until none fits in C
Example: |C | = 7
S1
3T1
2S2
2
T2
1
S3 4
T3
36
3
4
2
412
6
12
6
46
3 2
4
Ngonga Ngomo and Hassan (InfAI) GNOME November 10, 2016 11 / 22
![Page 23: The Lazy Traveling Salesman Memory Management for Large-Scale Link Discovery](https://reader035.fdocuments.us/reader035/viewer/2022062522/58e7b9e81a28ab65578b5847/html5/thumbnails/23.jpg)
Clustering: Naïve Approach
Greedy ApproachStart by largest taskAdd connected largest tasks until none fits in C
Example: |C | = 7
S1
3T1
2S2
2
T2
1
S3 4
T3
36
3
4
2
412
6
12
6
46
3 2
4
Ngonga Ngomo and Hassan (InfAI) GNOME November 10, 2016 11 / 22
![Page 24: The Lazy Traveling Salesman Memory Management for Large-Scale Link Discovery](https://reader035.fdocuments.us/reader035/viewer/2022062522/58e7b9e81a28ab65578b5847/html5/thumbnails/24.jpg)
Clustering: Naïve Approach
Greedy ApproachStart by largest taskAdd connected largest tasks until none fits in C
Example: |C | = 7
S1
3T1
2S2
2
T2
1
S3 4
T3
36
3
4
2
412
6
12
6
46
3 2
4
Ngonga Ngomo and Hassan (InfAI) GNOME November 10, 2016 11 / 22
![Page 25: The Lazy Traveling Salesman Memory Management for Large-Scale Link Discovery](https://reader035.fdocuments.us/reader035/viewer/2022062522/58e7b9e81a28ab65578b5847/html5/thumbnails/25.jpg)
Clustering: Naïve Approach
Greedy ApproachStart by largest taskAdd connected largest tasks until none fits in C
Example: |C | = 7
S1
3T1
2S2
2
T2
1
S3 4
T3
36
3
4
2
412
6
12
6
46
3 2
4
Ngonga Ngomo and Hassan (InfAI) GNOME November 10, 2016 11 / 22
![Page 26: The Lazy Traveling Salesman Memory Management for Large-Scale Link Discovery](https://reader035.fdocuments.us/reader035/viewer/2022062522/58e7b9e81a28ab65578b5847/html5/thumbnails/26.jpg)
Clustering: Naïve Approach
Greedy ApproachStart by largest taskAdd connected largest tasks until none fits in C
Example: |C | = 7
S1
3T1
2S2
2
T2
1
S3 4
T3
36
3
4
2
412
6
12
6
4
6
3 2
4
Ngonga Ngomo and Hassan (InfAI) GNOME November 10, 2016 11 / 22
![Page 27: The Lazy Traveling Salesman Memory Management for Large-Scale Link Discovery](https://reader035.fdocuments.us/reader035/viewer/2022062522/58e7b9e81a28ab65578b5847/html5/thumbnails/27.jpg)
Clustering: Naïve Approach
Greedy ApproachStart by largest taskAdd connected largest tasks until none fits in C
Example: |C | = 7
S1
3T1
2S2
2
T2
1
S3 4
T3
36
3
4
2
412
6
12
6
46
3 2
4
Ngonga Ngomo and Hassan (InfAI) GNOME November 10, 2016 11 / 22
![Page 28: The Lazy Traveling Salesman Memory Management for Large-Scale Link Discovery](https://reader035.fdocuments.us/reader035/viewer/2022062522/58e7b9e81a28ab65578b5847/html5/thumbnails/28.jpg)
Scheduling
Output of clustering is sequence G1, . . . ,GNof clustersIntuition: Consequent clusters should sharedataOverlap o(Gi ,Gj) =
∑v∈V (Gi )∩V (Gj )
|v |
Overlap o(G1, . . . ,GN) =N−1∑i=1
o(Gi ,Gi+1)
S1
3T1
2S2
2
T2
1
S3 4
T3
36
3
4
2
412
6
12
6
46
3 2
4
GoalMaximize overlap of generated sequence
Ngonga Ngomo and Hassan (InfAI) GNOME November 10, 2016 12 / 22
![Page 29: The Lazy Traveling Salesman Memory Management for Large-Scale Link Discovery](https://reader035.fdocuments.us/reader035/viewer/2022062522/58e7b9e81a28ab65578b5847/html5/thumbnails/29.jpg)
Scheduling
Best-EffortSelect random pair of clustersIf permutation improves overlap, then permuteRelies on local knowledge for scalability
Trick:
∆(Gi ,Gj) = (o(Gi−1,Gi ) + o(Gi ,Gi+1) + o(Gj−1,Gj) + o(Gj ,Gj+1))−(o(Gi−1,Gj) + o(Gj ,Gi+1) + o(Gj−1,Gi ) + o(Gi ,Gj+1)).
(1)
G3 G1 G2 G40 3 2
G4 G1 G2 G34 3 2
Ngonga Ngomo and Hassan (InfAI) GNOME November 10, 2016 13 / 22
![Page 30: The Lazy Traveling Salesman Memory Management for Large-Scale Link Discovery](https://reader035.fdocuments.us/reader035/viewer/2022062522/58e7b9e81a28ab65578b5847/html5/thumbnails/30.jpg)
Scheduling
Best-EffortSelect random pair of clustersIf permutation improves overlap, then permuteRelies on local knowledge for scalability
Trick:
∆(Gi ,Gj) = (o(Gi−1,Gi ) + o(Gi ,Gi+1) + o(Gj−1,Gj) + o(Gj ,Gj+1))−(o(Gi−1,Gj) + o(Gj ,Gi+1) + o(Gj−1,Gi ) + o(Gi ,Gj+1)).
(1)
G3 G1 G2 G40 3 2
G4 G1 G2 G34 3 2
Ngonga Ngomo and Hassan (InfAI) GNOME November 10, 2016 13 / 22
![Page 31: The Lazy Traveling Salesman Memory Management for Large-Scale Link Discovery](https://reader035.fdocuments.us/reader035/viewer/2022062522/58e7b9e81a28ab65578b5847/html5/thumbnails/31.jpg)
Scheduling
GreedyStart with random clusterChoose next cluster with largest overlapGlobal knowledge needed
G3 G1 G2 G40 3 2
G3 G2 G1 G42 3 4
Ngonga Ngomo and Hassan (InfAI) GNOME November 10, 2016 14 / 22
![Page 32: The Lazy Traveling Salesman Memory Management for Large-Scale Link Discovery](https://reader035.fdocuments.us/reader035/viewer/2022062522/58e7b9e81a28ab65578b5847/html5/thumbnails/32.jpg)
Scheduling
GreedyStart with random clusterChoose next cluster with largest overlapGlobal knowledge needed
G3 G1 G2 G40 3 2
G3 G2 G1 G42 3 4
Ngonga Ngomo and Hassan (InfAI) GNOME November 10, 2016 14 / 22
![Page 33: The Lazy Traveling Salesman Memory Management for Large-Scale Link Discovery](https://reader035.fdocuments.us/reader035/viewer/2022062522/58e7b9e81a28ab65578b5847/html5/thumbnails/33.jpg)
Experimental Setup
Datasets1 DBP: 1 million labels from DBpedia
version 04-20152 LGD: 0.8 million places from
LinkedGeoDataHardware
1 Intel Xeon E5-2650 v3 processors(2.30GHz)
2 Ubuntu 14.04.3 LTS3 10GB RAM
Measures1 Total runtime2 Hit ratio
Ngonga Ngomo and Hassan (InfAI) GNOME November 10, 2016 15 / 22
![Page 34: The Lazy Traveling Salesman Memory Management for Large-Scale Link Discovery](https://reader035.fdocuments.us/reader035/viewer/2022062522/58e7b9e81a28ab65578b5847/html5/thumbnails/34.jpg)
Evaluation of Clustering
Only show results of LGDResults on DBP lead to similar insights
Runtimes Hit Ratio|C | Naive Greedy Naive Greedy100 568.0 646.3 0.57 0.77200 518.3 594.0 0.66 0.80400 532.0 593.3 0.67 0.80
1,000 5,974.0 118,454.7 0.51 0.642,000 6,168.0 115,450.0 0.51 0.634,000 7,118.3 121,901.7 0.50 0.63
Conclusion1 Naïve approach is more efficient2 Greedy approach is more effective3 Select naïve approach for clustering
Ngonga Ngomo and Hassan (InfAI) GNOME November 10, 2016 16 / 22
![Page 35: The Lazy Traveling Salesman Memory Management for Large-Scale Link Discovery](https://reader035.fdocuments.us/reader035/viewer/2022062522/58e7b9e81a28ab65578b5847/html5/thumbnails/35.jpg)
Evaluation of Clustering
Only show results of LGDResults on DBP lead to similar insights
Runtimes Hit Ratio|C | Naive Greedy Naive Greedy100 568.0 646.3 0.57 0.77200 518.3 594.0 0.66 0.80400 532.0 593.3 0.67 0.80
1,000 5,974.0 118,454.7 0.51 0.642,000 6,168.0 115,450.0 0.51 0.634,000 7,118.3 121,901.7 0.50 0.63
Conclusion1 Naïve approach is more efficient2 Greedy approach is more effective3 Select naïve approach for clustering
Ngonga Ngomo and Hassan (InfAI) GNOME November 10, 2016 16 / 22
![Page 36: The Lazy Traveling Salesman Memory Management for Large-Scale Link Discovery](https://reader035.fdocuments.us/reader035/viewer/2022062522/58e7b9e81a28ab65578b5847/html5/thumbnails/36.jpg)
Evaluation of Scheduling
Only show results of LGDResults on DBP lead to similar insights
Runtimes (ms) Hit ratio|C | Best-Effort Greedy Best-Effort Greedy100 571.3 1,599.3 0.56 0.68200 565.7 1,448.3 0.66 0.85400 581.0 1,379.3 0.67 0.88
1,000 5,666.0 814,271.7 0.51 0.862,000 6,268.0 810,855.0 0.51 0.864,000 6,675.7 814,041.7 0.50 0.86
Conclusion1 Best-effort approach more time-efficient2 Best-effort approach is to be used for scheduling
Ngonga Ngomo and Hassan (InfAI) GNOME November 10, 2016 17 / 22
![Page 37: The Lazy Traveling Salesman Memory Management for Large-Scale Link Discovery](https://reader035.fdocuments.us/reader035/viewer/2022062522/58e7b9e81a28ab65578b5847/html5/thumbnails/37.jpg)
Evaluation of Scheduling
Only show results of LGDResults on DBP lead to similar insights
Runtimes (ms) Hit ratio|C | Best-Effort Greedy Best-Effort Greedy100 571.3 1,599.3 0.56 0.68200 565.7 1,448.3 0.66 0.85400 581.0 1,379.3 0.67 0.88
1,000 5,666.0 814,271.7 0.51 0.862,000 6,268.0 810,855.0 0.51 0.864,000 6,675.7 814,041.7 0.50 0.86
Conclusion1 Best-effort approach more time-efficient2 Best-effort approach is to be used for scheduling
Ngonga Ngomo and Hassan (InfAI) GNOME November 10, 2016 17 / 22
![Page 38: The Lazy Traveling Salesman Memory Management for Large-Scale Link Discovery](https://reader035.fdocuments.us/reader035/viewer/2022062522/58e7b9e81a28ab65578b5847/html5/thumbnails/38.jpg)
GNOME vs. Caching Approaches
Runtimes (ms)|C | Gnome FIFO F2 LFU LRU SLRU
1,000 5,974.0 37,161.0 42,090.3 45,906.7 54,194.3 56,904.32,000 6,168.0 31,977.0 39,071.3 39,872.0 45,473.0 46,795.04,000 7,118.3 21,337.0 40,860.0 28,028.3 26,816.7 27,200.0
Hit ratio1,000 0.51 0.17 0.16 0.19 0.17 0.172,000 0.51 0.29 0.30 0.32 0.30 0.304,000 0.51 0.54 0.55 0.59 0.55 0.56
Conclusion1 Gnome is more time-efficient2 Leads to higher hit rates in most cases
Ngonga Ngomo and Hassan (InfAI) GNOME November 10, 2016 18 / 22
![Page 39: The Lazy Traveling Salesman Memory Management for Large-Scale Link Discovery](https://reader035.fdocuments.us/reader035/viewer/2022062522/58e7b9e81a28ab65578b5847/html5/thumbnails/39.jpg)
Scalability
100k 200k 400k 800kLGD 362,141.3 1,452,922.0 5,934,038.7 20,001,965.7DBP 434,630.7 1,790,350.7 6,677,923.0 12,653,403.3
ConclusionSub-quadratic growth of runtimeRuntime grows linearly with number of mappingsFor LGD, 360 – 370 mappings/s
Ngonga Ngomo and Hassan (InfAI) GNOME November 10, 2016 19 / 22
![Page 40: The Lazy Traveling Salesman Memory Management for Large-Scale Link Discovery](https://reader035.fdocuments.us/reader035/viewer/2022062522/58e7b9e81a28ab65578b5847/html5/thumbnails/40.jpg)
Conclusion and Future Work
Presented GnomeTwo-step approach for link discoveryRelies on divide-and-merge paradigmEnsure LD on datasets of arbitrary sizeCompared with state-of-the-art cachingFuture Work
Parallel implementationCombination with blocking
Ngonga Ngomo and Hassan (InfAI) GNOME November 10, 2016 20 / 22
![Page 41: The Lazy Traveling Salesman Memory Management for Large-Scale Link Discovery](https://reader035.fdocuments.us/reader035/viewer/2022062522/58e7b9e81a28ab65578b5847/html5/thumbnails/41.jpg)
That’s all Folks!
Axel NgongaAKSW Research Group
Augustusplatz 10, Room P90504109 Leipzig, Germany
[email protected]://limes.sf.net
Ngonga Ngomo and Hassan (InfAI) GNOME November 10, 2016 21 / 22
![Page 42: The Lazy Traveling Salesman Memory Management for Large-Scale Link Discovery](https://reader035.fdocuments.us/reader035/viewer/2022062522/58e7b9e81a28ab65578b5847/html5/thumbnails/42.jpg)
Acknowledgement
This work was supported by grants from the EU H2020 Framework Programmeprovided for the project HOBBIT (GA no. 688227).
Ngonga Ngomo and Hassan (InfAI) GNOME November 10, 2016 22 / 22