OWL reasoning with WebPIE: calculating the closer of 100 billion triples

Post on 16-Jan-2017

131 views 0 download

Transcript of OWL reasoning with WebPIE: calculating the closer of 100 billion triples

+

OWL reasoning with WebPIE:calculating the closer of 100 billion triples

Presented by :Mahdi Atawna

2+outline

Introduction. Paper motivation. Methodology. MapReduce. WebPIE and OWL challenges. Experiment. Results and conclusion. Criticism.

3+About the paper

Authored by: Jacopo Urbani, Spyros Kotoulas, Jason Maassen, Frank van Harmelen, and Henri Bal from Vrije Universiteit Amsterdam an extension to a previous published paper : Scalable

Distributed Reasoning using MapReduce" in 2009 that focused on handling the reasoning of RDFS data only.

This paper published in 2010 to extend the approach introduces in the previous published paper to handle the complexity OWL semantic.

4+Definitions

Semantic Reasoner: a piece of software able to infer logical consequences from a set of asserted facts or axioms.

MapReduce : programming model that allows for massive scalability across large number of servers in cluster

5+Paper motivation

There is a problem in most previous reasoning methods is that they all: Centralized Performance depends on improving the hardware and

data structure of the computer to get more performance which reach it limit sooner in large data-sets.

6+Research problem

Develop a method to handle large scale data. The new method will use a scalable distributed

approach which performs the processing in parallel. by using this approach the performance can be scaled in two dimensions , first by the hardware of each node , second by the number of nodes .

7+OWL & RDF

OWL

RDFS

8+Methodology

The researchers present a new method to handle large scale data by using scalable distributed approach which performs the processing in parallel.

by using this approach the performance can be scaled in two dimensions , hardware of each node , number of nodes .

To achieve this approach , they used MapReduce . MapReduce : programming model that allows for

massive scalability across large number of servers in cluster .

9+MapReduce !

The MapReduce term refers to two separate tasks:

Map: which takes a large set of data and break it down into tuples ( key/value pairs).

Reduce: which performed after "Map" and takes the "Map" output as input and reduce the input into smaller set of tuples by combine the input tuples .

10+MapReduce Example:

11+

12+MapReduce Example:

13+

14+The previous paper

focused on RDFS, the closer of RDF input can be computed and reach a

fix-point by applying all rules repeatedly until no new data,

it can be implemented easily on single-antecedent triples

but in multi-antecedent triples the implementation is challenging because it needs to perform a join between these related triples

15+Example of multi-antecedent :

A rdf:type X,X rdfs:subClassOf Y

=> A rdf:type Y

16+WebPIE reasoning engine

In this paper The researchers extend there previous work to support OWL

introduced a new massively scalable OWL reasoning engine called "WebPIE" which deal with the complex OWL entailment rules.

17+OWL challenges

OWL has many challenges that WebPIE will overcome such as:

1. no rule ordering.

2. Joins between multiple instance triples.

3. Duplicate derivations.

4. multiple joins per rule.

18+OWL Horst fragment

The authors chosen to work on Horst fragment of OWL rule-set : it's the standard used in industry , it can be expressed by rule set, it make a balance between OWL full and the limited RDFS.

The OWL Horst rule-set (known as pD) consist of two parts: 1- RDFS rules ( defined as D), 2- other 16 rules (defined as p) ,

19+

20+OWL Horst fragment

The researchers explored p rule-set and noticed that : Some rules can be implemented using the optimization

introduced in RDFS reasoning . Furthermore the found that rules 1 and 2 are

straightforward to implement by partitioning on subject and predicate.

All other rules need a custom algorithm to be implemented, these rules are : transitivity, sameAs, someValuesFrom and allValuesFrom.

21+

22+

23+Experiment

They make there experiments on hadoop cluster of 46 nodes equipped with

dual-core 2.4 GHz,4GB RAM, 250 GB HD, and Gigabit Ethernet interconnect).

used three data sets : 1- UniPort (1.51 billion triples). 2- LDSR (0.9 billion triples). 3- LUMB (up to 100 billion triples ).

24+Results

The experiment results on the data-sets was as follow: UniPort processing took 6.1 hours,

LDSR processing took 3.52 hours. these result show that this implantation outperform

other current systems in the same field.

25+Conclusions

after making the experiment the researchers observed that the throughput is higher (almost 0.30) for larger data-sets,.

the execution time depends on the complexity of input. linear scalability regarding the input size and nodes

number.

26+Criticism

LUMB rule-set result not found in result. the method that they introduced can be easily

implemented using MapReduce on hadoop to paralyze the processing,.

but it will be expensive.

27

+Questions ?Thank you