OWL reasoning with WebPIE: calculating the closer of 100 billion triples

27
+ OWL reasoning with WebPIE: calculating the closer of 100 billion triples Presented by : Mahdi Atawna

Transcript of OWL reasoning with WebPIE: calculating the closer of 100 billion triples

Page 1: OWL reasoning with WebPIE: calculating the closer of 100 billion triples

+

OWL reasoning with WebPIE:calculating the closer of 100 billion triples

Presented by :Mahdi Atawna

Page 2: OWL reasoning with WebPIE: calculating the closer of 100 billion triples

2+outline

Introduction. Paper motivation. Methodology. MapReduce. WebPIE and OWL challenges. Experiment. Results and conclusion. Criticism.

Page 3: OWL reasoning with WebPIE: calculating the closer of 100 billion triples

3+About the paper

Authored by: Jacopo Urbani, Spyros Kotoulas, Jason Maassen, Frank van Harmelen, and Henri Bal from Vrije Universiteit Amsterdam an extension to a previous published paper : Scalable

Distributed Reasoning using MapReduce" in 2009 that focused on handling the reasoning of RDFS data only.

This paper published in 2010 to extend the approach introduces in the previous published paper to handle the complexity OWL semantic.

Page 4: OWL reasoning with WebPIE: calculating the closer of 100 billion triples

4+Definitions

Semantic Reasoner: a piece of software able to infer logical consequences from a set of asserted facts or axioms.

MapReduce : programming model that allows for massive scalability across large number of servers in cluster

Page 5: OWL reasoning with WebPIE: calculating the closer of 100 billion triples

5+Paper motivation

There is a problem in most previous reasoning methods is that they all: Centralized Performance depends on improving the hardware and

data structure of the computer to get more performance which reach it limit sooner in large data-sets.

Page 6: OWL reasoning with WebPIE: calculating the closer of 100 billion triples

6+Research problem

Develop a method to handle large scale data. The new method will use a scalable distributed

approach which performs the processing in parallel. by using this approach the performance can be scaled in two dimensions , first by the hardware of each node , second by the number of nodes .

Page 7: OWL reasoning with WebPIE: calculating the closer of 100 billion triples

7+OWL & RDF

OWL

RDFS

Page 8: OWL reasoning with WebPIE: calculating the closer of 100 billion triples

8+Methodology

The researchers present a new method to handle large scale data by using scalable distributed approach which performs the processing in parallel.

by using this approach the performance can be scaled in two dimensions , hardware of each node , number of nodes .

To achieve this approach , they used MapReduce . MapReduce : programming model that allows for

massive scalability across large number of servers in cluster .

Page 9: OWL reasoning with WebPIE: calculating the closer of 100 billion triples

9+MapReduce !

The MapReduce term refers to two separate tasks:

Map: which takes a large set of data and break it down into tuples ( key/value pairs).

Reduce: which performed after "Map" and takes the "Map" output as input and reduce the input into smaller set of tuples by combine the input tuples .

Page 10: OWL reasoning with WebPIE: calculating the closer of 100 billion triples

10+MapReduce Example:

Page 11: OWL reasoning with WebPIE: calculating the closer of 100 billion triples

11+

Page 12: OWL reasoning with WebPIE: calculating the closer of 100 billion triples

12+MapReduce Example:

Page 13: OWL reasoning with WebPIE: calculating the closer of 100 billion triples

13+

Page 14: OWL reasoning with WebPIE: calculating the closer of 100 billion triples

14+The previous paper

focused on RDFS, the closer of RDF input can be computed and reach a

fix-point by applying all rules repeatedly until no new data,

it can be implemented easily on single-antecedent triples

but in multi-antecedent triples the implementation is challenging because it needs to perform a join between these related triples

Page 15: OWL reasoning with WebPIE: calculating the closer of 100 billion triples

15+Example of multi-antecedent :

A rdf:type X,X rdfs:subClassOf Y

=> A rdf:type Y

Page 16: OWL reasoning with WebPIE: calculating the closer of 100 billion triples

16+WebPIE reasoning engine

In this paper The researchers extend there previous work to support OWL

introduced a new massively scalable OWL reasoning engine called "WebPIE" which deal with the complex OWL entailment rules.

Page 17: OWL reasoning with WebPIE: calculating the closer of 100 billion triples

17+OWL challenges

OWL has many challenges that WebPIE will overcome such as:

1. no rule ordering.

2. Joins between multiple instance triples.

3. Duplicate derivations.

4. multiple joins per rule.

Page 18: OWL reasoning with WebPIE: calculating the closer of 100 billion triples

18+OWL Horst fragment

The authors chosen to work on Horst fragment of OWL rule-set : it's the standard used in industry , it can be expressed by rule set, it make a balance between OWL full and the limited RDFS.

The OWL Horst rule-set (known as pD) consist of two parts: 1- RDFS rules ( defined as D), 2- other 16 rules (defined as p) ,

Page 19: OWL reasoning with WebPIE: calculating the closer of 100 billion triples

19+

Page 20: OWL reasoning with WebPIE: calculating the closer of 100 billion triples

20+OWL Horst fragment

The researchers explored p rule-set and noticed that : Some rules can be implemented using the optimization

introduced in RDFS reasoning . Furthermore the found that rules 1 and 2 are

straightforward to implement by partitioning on subject and predicate.

All other rules need a custom algorithm to be implemented, these rules are : transitivity, sameAs, someValuesFrom and allValuesFrom.

Page 21: OWL reasoning with WebPIE: calculating the closer of 100 billion triples

21+

Page 22: OWL reasoning with WebPIE: calculating the closer of 100 billion triples

22+

Page 23: OWL reasoning with WebPIE: calculating the closer of 100 billion triples

23+Experiment

They make there experiments on hadoop cluster of 46 nodes equipped with

dual-core 2.4 GHz,4GB RAM, 250 GB HD, and Gigabit Ethernet interconnect).

used three data sets : 1- UniPort (1.51 billion triples). 2- LDSR (0.9 billion triples). 3- LUMB (up to 100 billion triples ).

Page 24: OWL reasoning with WebPIE: calculating the closer of 100 billion triples

24+Results

The experiment results on the data-sets was as follow: UniPort processing took 6.1 hours,

LDSR processing took 3.52 hours. these result show that this implantation outperform

other current systems in the same field.

Page 25: OWL reasoning with WebPIE: calculating the closer of 100 billion triples

25+Conclusions

after making the experiment the researchers observed that the throughput is higher (almost 0.30) for larger data-sets,.

the execution time depends on the complexity of input. linear scalability regarding the input size and nodes

number.

Page 26: OWL reasoning with WebPIE: calculating the closer of 100 billion triples

26+Criticism

LUMB rule-set result not found in result. the method that they introduced can be easily

implemented using MapReduce on hadoop to paralyze the processing,.

but it will be expensive.

Page 27: OWL reasoning with WebPIE: calculating the closer of 100 billion triples

27

+Questions ?Thank you