Level 1 fauna habitat assessment and targeted mulgara survey
Mulgara
-
Upload
david-wood -
Category
Documents
-
view
1.084 -
download
0
Transcript of Mulgara
MulgaraOpen Source Semantic Web
Paul Gearon
Mulgara
RDF Database
Open Source
Written in Java
Over 320,000 lines of code
Math in MulgaraGraphs and Trees (Chapter 5)
Recursive Algorithms (Chapter 2)
Functions (Chapter 4)
Formal Languages and Algebras (Chapter 8)
Graph Algorithms (Chapter 6)
Prolog, Rules, Logic (Chapter 1)
Sets (Chapter 3)
Math in MulgaraGraphs and Trees (Chapter 5)
Recursive Algorithms (Chapter 2)
Functions (Chapter 4)
Formal Languages and Algebras (Chapter 8)
Graph Algorithms (Chapter 6)
Prolog, Rules, Logic (Chapter 1)
Sets (Chapter 3)
All programming uses Boolean Logic (Chapter 7)
RDF
A simple Description Logic (Chapter 1)
Provides structure for data in the Semantic Web
Simple data format of binary predicates, or Triples
Triples combine to form a directed graph
RDF
Simple
Describes schemas, ontologies, and instance data
Foundation for complex logic systems like OWL
Describes relationships between arbitrary things
Forms a graph (Chapter 5)
Can be used to describe anything
RDF Triples
:David :knows :Paul
:knows(:David, :Paul)
:Paul:David
:knows
RDF Graph
:David
:Person
:Paul
Dr David Smith
mailto:[email protected]
:knows
rdf:typerdf:type
:fullname:title
RDF Graph
:David
:Person
:Paul
Dr David Smith
mailto:[email protected]
:knows
rdf:typerdf:type
:fullname:title
:knows
:fullname
:title
rdfs:domainrdfs:range
rdfs:domain
rdfs:domain
rdfs:domain
Storage
Speed
Fast storage
Quickly find what we want
Storage
Speed
Fast storage
Quickly find what we want
Index the data
Persistent Storage
Must be efficient in space
Smaller means more data
Smaller means faster read/write
Must support re-writable data
Indexed, rewritable data usually means regular sized data blocks
One Approach
Map URIs and strings to numbers
Map numbers back to URIs and strings
Store a triple as 3 numbers
see Adjacency Matrix (page 418)and Adjacency List (page 420)
:David :Paul
:Person
rdf:typerdf:type
:knows
Representation
:David rdf:type :Person:Paul rdf:type :Person:David :knows :Paul
3 4
5
11
2
rdf:type:knows:David:Paul:Person
1 2 3 4 5
3 1 5 4 1 5 3 2 4
Representation
:David rdf:type :Person:Paul rdf:type :Person:David :knows :Paul
Finding Triples
Sort by columns
3 1 5 3 2 4 4 1 5
3 1 5 4 1 5 3 2 4
3 2 4 3 1 5 4 1 5
S P O
S then P then O
P then O then S
O then S then P
Finding Triples
Sort by columns
3 1 5 3 2 4 4 1 5
3 1 5 4 1 5 3 2 4
3 2 4 3 1 5 4 1 5
S P O
S then P then O
P then O then S
O then S then P
Finding Triples
Sort by columns
3 1 5 3 2 4 4 1 5
3 1 5 4 1 5 3 2 4
3 2 4 3 1 5 4 1 5
S P O
S then P then O
P then O then S
O then S then P
Disk Structure
Linear layouts do not scale
Disk Structure
Linear layouts do not scale
Disk Structure
Linear layouts do not scale
Disk Structure
Linear layouts do not scale
Trees scale well
Disk Structure
Linear layouts do not scale
Trees scale well
Trees
Scale well
Basis of every major database
Fast writing
Fast reading
Can be split over a network
Index Searches
Use a binary tree search on the trees (page 456)
Logarithmic complexity
Blocks of data stored in tree nodes as stored data
Use a binary search on sorted data blocks (page 138)
Logarithmic complexity
Why Binary?
Wider trees have identical complexity (logarithmic)
Wider trees have fewer disk seeks
Linear effect on complexity, which has no effect
Wider trees have complex rebalancing
Better than Trees?
Hash Tables (pages 362-366)
Better than Trees?
Hash Tables (pages 362-366)
Have constant complexity, BUT:
Use too much space (scaling issues)
Need to be expanded when they get too full
Great for smaller data sets in RAM
Better than Trees?
Hash Tables (pages 362-366)
Have constant complexity, BUT:
Use too much space (scaling issues)
Need to be expanded when they get too full
Great for smaller data sets in RAM
Poor for disk usage - Good for clusters
Mapping
Bijective Function (page 339)
Store key/value pair, indexed by key
Trees order by keyDatatype ordering (lexical, numerical, dates, etc)
Can find ranges of dataFind all students enrolled between 1-1-2010 and 31-12-2010
Hashmaps have no ordering
Real Data Searches
Combination searches
“The list of people who know :Paul”
The list of people
AND
Things that know :Paul
Constraints
Bind a variable with a constraint
?x rdf:type :Person
?x :knows :Paul
Describe requirements with a formal language
Tucana Query Language (TQL)
SPARQL Protocol and RDF Query Language (SPARQL)
Query Languages
Formal language
Context Free Grammar
Chapter 8, section 8.4
SPARQL example:
SELECT ?personWHERE { ?person a :Person . ?person :knows :Paul}
Algebra
Formal language converted to an Algebra (Section 8.1)
Constraints are combined and manipulated algebraically
Optimization through algebraic manipulation
Example:
before optimization: ~600 seconds
after optimization: 0.8 seconds
Algebraic Operations
AND operations (Conjunctions)
Mergesort (page 179)
OR operations (Disjunctions)
Union then sort (Chapter 2)
Others
Filter, Minus, LeftJoin, Datatype, etc...
Graph Operations
List operations
Graph traversal
Transitivity (page 289)
Distance between nodes
Algorithm similar to Euler Path (page 490) on constrained graph
Ontologies
Formal representation of knowledge
Set of concepts in a domain
Relationship between concepts
Vocabularies for building ontologies expressed in RDF
RDF Schema (RDFS)Simple Knowledge Organization System (SKOS)Web Ontology Language (OWL)
Rules
RDF has few semantics
Support for higher languages through Rules (page 72)
Uses Prolog style language (page 64-71) to express Horn clauses (page 66), and therefore modus ponens (page 23)
RDFS, SKOS and most of OWL all supported through Rules
Rule Examples
P(B,X) :- owl:sameAs(A,B), P(A,X).
P(B,A) :- owl:SymmetricProperty(P), P(A,B).
owl:SymmetricProperty(owl:sameAs).
if A is the same as B (owl:sameAs), and A relates to X, then B relates to X the same way.
if P is a symmetric property (owl:SymmetricProperty), and P relates A to B, then P also relates B to A
owl:sameAs is a symmetric property
OWL Properties
Transitive Properties (page 289)
Symmetric/Asymmetric Properties (page 289)
Reflexive/Irreflexive Properties (page 289)
Functional/Inverse-Functional Properties (page 341)
Property inverses (page 342)
Disjoint properties (page 195)
OWL Classes
Defined with Set semantics (Chapter 3, section 3.1)
Handles both instance data (set membership, pg 187) and set descriptions
Types described with a Unary Predicate (pg 36, 188)RDF represents this with predicate of rdf:type
Existential (pg 36), Universal (pg 35), Complementary (pg 195), Cardinality (pg 320), and Datatype operations
OWL in Use
Represent schemas, similar to database schemas
Automated research for candidate drug treatments
NASA inventories