Mulgara

42
Mulgara Open Source Semantic Web Paul Gearon [email protected]

Transcript of Mulgara

Page 1: Mulgara

MulgaraOpen Source Semantic Web

Paul Gearon

[email protected]

Page 2: Mulgara

Mulgara

RDF Database

Open Source

Written in Java

Over 320,000 lines of code

Page 3: Mulgara

Math in MulgaraGraphs and Trees (Chapter 5)

Recursive Algorithms (Chapter 2)

Functions (Chapter 4)

Formal Languages and Algebras (Chapter 8)

Graph Algorithms (Chapter 6)

Prolog, Rules, Logic (Chapter 1)

Sets (Chapter 3)

Page 4: Mulgara

Math in MulgaraGraphs and Trees (Chapter 5)

Recursive Algorithms (Chapter 2)

Functions (Chapter 4)

Formal Languages and Algebras (Chapter 8)

Graph Algorithms (Chapter 6)

Prolog, Rules, Logic (Chapter 1)

Sets (Chapter 3)

All programming uses Boolean Logic (Chapter 7)

Page 5: Mulgara

RDF

A simple Description Logic (Chapter 1)

Provides structure for data in the Semantic Web

Simple data format of binary predicates, or Triples

Triples combine to form a directed graph

Page 6: Mulgara

RDF

Simple

Describes schemas, ontologies, and instance data

Foundation for complex logic systems like OWL

Describes relationships between arbitrary things

Forms a graph (Chapter 5)

Can be used to describe anything

Page 7: Mulgara

RDF Triples

:David :knows :Paul

:knows(:David, :Paul)

:Paul:David

:knows

Page 8: Mulgara

RDF Graph

:David

:Person

:Paul

Dr David Smith

mailto:[email protected]

:knows

rdf:typerdf:type

:email

:fullname:title

Page 9: Mulgara

RDF Graph

:David

:Person

:Paul

Dr David Smith

mailto:[email protected]

:knows

rdf:typerdf:type

:email

:fullname:title

:knows

:email

:fullname

:title

rdfs:domainrdfs:range

rdfs:domain

rdfs:domain

rdfs:domain

Page 10: Mulgara

Storage

Speed

Fast storage

Quickly find what we want

Page 11: Mulgara

Storage

Speed

Fast storage

Quickly find what we want

Index the data

Page 12: Mulgara

Persistent Storage

Must be efficient in space

Smaller means more data

Smaller means faster read/write

Must support re-writable data

Indexed, rewritable data usually means regular sized data blocks

Page 13: Mulgara

One Approach

Map URIs and strings to numbers

Map numbers back to URIs and strings

Store a triple as 3 numbers

see Adjacency Matrix (page 418)and Adjacency List (page 420)

Page 14: Mulgara

:David :Paul

:Person

rdf:typerdf:type

:knows

Representation

:David rdf:type :Person:Paul rdf:type :Person:David :knows :Paul

Page 15: Mulgara

3 4

5

11

2

rdf:type:knows:David:Paul:Person

1 2 3 4 5

3 1 5 4 1 5 3 2 4

Representation

:David rdf:type :Person:Paul rdf:type :Person:David :knows :Paul

Page 16: Mulgara

Finding Triples

Sort by columns

3 1 5 3 2 4 4 1 5

3 1 5 4 1 5 3 2 4

3 2 4 3 1 5 4 1 5

S P O

S then P then O

P then O then S

O then S then P

Page 17: Mulgara

Finding Triples

Sort by columns

3 1 5 3 2 4 4 1 5

3 1 5 4 1 5 3 2 4

3 2 4 3 1 5 4 1 5

S P O

S then P then O

P then O then S

O then S then P

Page 18: Mulgara

Finding Triples

Sort by columns

3 1 5 3 2 4 4 1 5

3 1 5 4 1 5 3 2 4

3 2 4 3 1 5 4 1 5

S P O

S then P then O

P then O then S

O then S then P

Page 19: Mulgara

Disk Structure

Linear layouts do not scale

Page 20: Mulgara

Disk Structure

Linear layouts do not scale

Page 21: Mulgara

Disk Structure

Linear layouts do not scale

Page 22: Mulgara

Disk Structure

Linear layouts do not scale

Trees scale well

Page 23: Mulgara

Disk Structure

Linear layouts do not scale

Trees scale well

Page 24: Mulgara

Trees

Scale well

Basis of every major database

Fast writing

Fast reading

Can be split over a network

Page 25: Mulgara

Index Searches

Use a binary tree search on the trees (page 456)

Logarithmic complexity

Blocks of data stored in tree nodes as stored data

Use a binary search on sorted data blocks (page 138)

Logarithmic complexity

Page 26: Mulgara

Why Binary?

Wider trees have identical complexity (logarithmic)

Wider trees have fewer disk seeks

Linear effect on complexity, which has no effect

Wider trees have complex rebalancing

Page 27: Mulgara

Better than Trees?

Hash Tables (pages 362-366)

Page 28: Mulgara

Better than Trees?

Hash Tables (pages 362-366)

Have constant complexity, BUT:

Use too much space (scaling issues)

Need to be expanded when they get too full

Great for smaller data sets in RAM

Page 29: Mulgara

Better than Trees?

Hash Tables (pages 362-366)

Have constant complexity, BUT:

Use too much space (scaling issues)

Need to be expanded when they get too full

Great for smaller data sets in RAM

Poor for disk usage - Good for clusters

Page 30: Mulgara

Mapping

Bijective Function (page 339)

Store key/value pair, indexed by key

Trees order by keyDatatype ordering (lexical, numerical, dates, etc)

Can find ranges of dataFind all students enrolled between 1-1-2010 and 31-12-2010

Hashmaps have no ordering

Page 31: Mulgara

Real Data Searches

Combination searches

“The list of people who know :Paul”

The list of people

AND

Things that know :Paul

Page 32: Mulgara

Constraints

Bind a variable with a constraint

?x rdf:type :Person

?x :knows :Paul

Describe requirements with a formal language

Tucana Query Language (TQL)

SPARQL Protocol and RDF Query Language (SPARQL)

Page 33: Mulgara

Query Languages

Formal language

Context Free Grammar

Chapter 8, section 8.4

SPARQL example:

SELECT ?personWHERE { ?person a :Person . ?person :knows :Paul}

Page 34: Mulgara

Algebra

Formal language converted to an Algebra (Section 8.1)

Constraints are combined and manipulated algebraically

Optimization through algebraic manipulation

Example:

before optimization: ~600 seconds

after optimization: 0.8 seconds

Page 35: Mulgara

Algebraic Operations

AND operations (Conjunctions)

Mergesort (page 179)

OR operations (Disjunctions)

Union then sort (Chapter 2)

Others

Filter, Minus, LeftJoin, Datatype, etc...

Page 36: Mulgara

Graph Operations

List operations

Graph traversal

Transitivity (page 289)

Distance between nodes

Algorithm similar to Euler Path (page 490) on constrained graph

Page 37: Mulgara

Ontologies

Formal representation of knowledge

Set of concepts in a domain

Relationship between concepts

Vocabularies for building ontologies expressed in RDF

RDF Schema (RDFS)Simple Knowledge Organization System (SKOS)Web Ontology Language (OWL)

Page 38: Mulgara

Rules

RDF has few semantics

Support for higher languages through Rules (page 72)

Uses Prolog style language (page 64-71) to express Horn clauses (page 66), and therefore modus ponens (page 23)

RDFS, SKOS and most of OWL all supported through Rules

Page 39: Mulgara

Rule Examples

P(B,X) :- owl:sameAs(A,B), P(A,X).

P(B,A) :- owl:SymmetricProperty(P), P(A,B).

owl:SymmetricProperty(owl:sameAs).

if A is the same as B (owl:sameAs), and A relates to X, then B relates to X the same way.

if P is a symmetric property (owl:SymmetricProperty), and P relates A to B, then P also relates B to A

owl:sameAs is a symmetric property

Page 40: Mulgara

OWL Properties

Transitive Properties (page 289)

Symmetric/Asymmetric Properties (page 289)

Reflexive/Irreflexive Properties (page 289)

Functional/Inverse-Functional Properties (page 341)

Property inverses (page 342)

Disjoint properties (page 195)

Page 41: Mulgara

OWL Classes

Defined with Set semantics (Chapter 3, section 3.1)

Handles both instance data (set membership, pg 187) and set descriptions

Types described with a Unary Predicate (pg 36, 188)RDF represents this with predicate of rdf:type

Existential (pg 36), Universal (pg 35), Complementary (pg 195), Cardinality (pg 320), and Datatype operations

Page 42: Mulgara

OWL in Use

Represent schemas, similar to database schemas

Automated research for candidate drug treatments

NASA inventories