An RDF and XML Database John Snelson, Lead Engineer 23 rd October 2013.

An RDF and XML Database John Snelson, Lead Engineer 23 rd October 2013

Transcript of An RDF and XML Database John Snelson, Lead Engineer 23 rd October 2013.

Page 1: An RDF and XML Database John Snelson, Lead Engineer 23 rd October 2013.

An RDF and XML DatabaseJohn Snelson, Lead Engineer23rd October 2013

Page 2: An RDF and XML Database John Snelson, Lead Engineer 23 rd October 2013.

Slide 2 Copyright © 2013 MarkLogic® Corporation. All rights reserved.




Page 3: An RDF and XML Database John Snelson, Lead Engineer 23 rd October 2013.

Slide 3 Copyright © 2013 MarkLogic® Corporation. All rights reserved.

Data ≠


Page 4: An RDF and XML Database John Snelson, Lead Engineer 23 rd October 2013.

Slide 4 Copyright © 2013 MarkLogic® Corporation. All rights reserved.

Data +Context =Information

Page 5: An RDF and XML Database John Snelson, Lead Engineer 23 rd October 2013.

Slide 5 Copyright © 2013 MarkLogic® Corporation. All rights reserved.

Dynamic Semantic PublishingBBC Sports

Size and Complexity: # of athletes # of teams # of assets (match

reports, statistics, etc.) # of relations (facts)

Rich user experience See information in

context Personalize content Easy navigation Intelligently serve ads

(outside of UK)

Manageable Static pages? Too

many, changing too fast

Limited number of journalists

Automate as much as possible

The Challenge Goals

Page 6: An RDF and XML Database John Snelson, Lead Engineer 23 rd October 2013.

Slide 6 Copyright © 2013 MarkLogic® Corporation. All rights reserved.

Dynamic Semantic PublishingA Solution

Store, manage documents

Stories Blogs Feeds Profiles

Store, manage values Statistics

Full-Text search Performance,

scalability Robustness

Metadata about documents

Tagged by journalists Added

(semi-)automatically Inferred

Facts reported by journalists

Linked Open Data for real-world facts

XML Database Triple Store

Page 7: An RDF and XML Database John Snelson, Lead Engineer 23 rd October 2013.

Slide 7 Copyright © 2013 MarkLogic® Corporation. All rights reserved.

played in

plays in

plays for

Dynamic Semantic PublishingUnderstanding Data

Page 8: An RDF and XML Database John Snelson, Lead Engineer 23 rd October 2013.

Slide 8 Copyright © 2013 MarkLogic® Corporation. All rights reserved.

Dynamic Semantic PublishingScaling Up

Page 9: An RDF and XML Database John Snelson, Lead Engineer 23 rd October 2013.

Slide 9 Copyright © 2013 MarkLogic® Corporation. All rights reserved.

What is RDF?










:place5 :first-name:person4 “John”

Page 10: An RDF and XML Database John Snelson, Lead Engineer 23 rd October 2013.

Slide 10 Copyright © 2013 MarkLogic® Corporation. All rights reserved.

What is RDF?

• Schema-less• Triple granularity• Open world assumption• Joins - the cost of granularity


Page 11: An RDF and XML Database John Snelson, Lead Engineer 23 rd October 2013.

Slide 11 Copyright © 2013 MarkLogic® Corporation. All rights reserved.

Data stored in Triples

Expressed as Subject : Predicate : Object


"John Smith" : livesIn : "London""London" : isIn : "England"

What is Semantics?

Page 12: An RDF and XML Database John Snelson, Lead Engineer 23 rd October 2013.

Slide 12 Copyright © 2013 MarkLogic® Corporation. All rights reserved.

Data stored in Triples

Expressed as Subject : Predicate : Object


"John Smith" : livesIn : "London""London" : isIn : "England"

Rules tell us something about the triples


If (A livesIn X) AND (X isIn Y) then (A livesIn Y)

Inference: "John Smith" : livesIn : "England"

What is Semantics?

Page 13: An RDF and XML Database John Snelson, Lead Engineer 23 rd October 2013.

Slide 13 Copyright © 2013 MarkLogic® Corporation. All rights reserved.

Data stored in Triples

Expressed as Subject : Predicate : Object


"John Smith" : livesIn : "London""London" : isIn : "England"

Rules tell us something about the triples

What is Semantics?

"John Smith" "England"livesIn



Page 14: An RDF and XML Database John Snelson, Lead Engineer 23 rd October 2013.

Slide 15 Copyright © 2013 MarkLogic® Corporation. All rights reserved.

Semantics Architecture




Page 15: An RDF and XML Database John Snelson, Lead Engineer 23 rd October 2013.

Slide 16 Copyright © 2013 MarkLogic® Corporation. All rights reserved.

Triple Index

• 3 triple orders• Cached for performance• Works seamlessly with other indexes• Security• 150 bytes per triple on disk• Billions of triples per host• Scaling out horizontally


Page 16: An RDF and XML Database John Snelson, Lead Engineer 23 rd October 2013.

Slide 17 Copyright © 2013 MarkLogic® Corporation. All rights reserved.

RDF Loading


Page 17: An RDF and XML Database John Snelson, Lead Engineer 23 rd October 2013.

Slide 18 Copyright © 2013 MarkLogic® Corporation. All rights reserved.

Triples Embedded in Documents

…<sem:triple> <sem:subject> </sem:subject> <sem:predicate> </sem:predicate> <sem:object datatype=""> Lawford </sem:object></sem:triple>…

Page 18: An RDF and XML Database John Snelson, Lead Engineer 23 rd October 2013.

Slide 19 Copyright © 2013 MarkLogic® Corporation. All rights reserved.

Content, Data, and Semantics


<title>Suspicious vehicle…Suspicious vehicle near airport






<type>suspicious activity

<category>suspicious vehicle













<object>ABC 123

<description>A blue van…A blue van with license plate ABC 123 was observed parked behind the airport sign…




















Page 19: An RDF and XML Database John Snelson, Lead Engineer 23 rd October 2013.

Slide 20 Copyright © 2013 MarkLogic® Corporation. All rights reserved.

Content, Data, and Semantics



Suspicious vehicle…





suspicious activity


suspicious vehicle







A blue van…










ABC 123<predicate>








Unstructured full-




Page 20: An RDF and XML Database John Snelson, Lead Engineer 23 rd October 2013.

Slide 21 Copyright © 2013 MarkLogic® Corporation. All rights reserved.

RDF Values


“string value”^^xs:string


“2013-04-09”^^xs:date “bonjour”@fr



Page 21: An RDF and XML Database John Snelson, Lead Engineer 23 rd October 2013.

Slide 22 Copyright © 2013 MarkLogic® Corporation. All rights reserved.

Datatype Mapping

Datatype SPARQL XQuery

Typed Literal



IRI <> sem:iri(“http://”)

Blank Node _:blank1 sem:blank(“…”)

Simple Literal “simple” xs:string(“simple”)

Language “bonjour”@frTaggedLiteral


Page 22: An RDF and XML Database John Snelson, Lead Engineer 23 rd October 2013.

Slide 23 Copyright © 2013 MarkLogic® Corporation. All rights reserved.


• Executed using the triple index• SPARQL 1.0 + much of SPARQL 1.1• Cost-based optimization• Join ordering and algorithms

select * where { ?person :birth-place ?place; :first-name “John”}


Page 23: An RDF and XML Database John Snelson, Lead Engineer 23 rd October 2013.

Slide 24 Copyright © 2013 MarkLogic® Corporation. All rights reserved.

Executing SPARQL

sem:sparql(“ prefix : <> select * { ?person :first-name ?first; :last-name ?last; :alma-mater [:ivy-league :true] }”, map:entry(“first”,“John”), (), cts:collection-query(“mycollection”))

Page 24: An RDF and XML Database John Snelson, Lead Engineer 23 rd October 2013.

Slide 25 Copyright © 2013 MarkLogic® Corporation. All rights reserved.

Returning Binding Solutions

select * where { ?person :birth-place :place5}

select * where { ?person :birth-place ?place; :first-name “John”}

Page 25: An RDF and XML Database John Snelson, Lead Engineer 23 rd October 2013.

Slide 26 Copyright © 2013 MarkLogic® Corporation. All rights reserved.

Solution Results

person place

:person22 :place13

:person4 :place5


Page 26: An RDF and XML Database John Snelson, Lead Engineer 23 rd October 2013.

Slide 27 Copyright © 2013 MarkLogic® Corporation. All rights reserved.

SPARQL Query Results XML Format

sem:query-result-serialize( sem:sparql(“select * { … }”), “xml”)

Page 27: An RDF and XML Database John Snelson, Lead Engineer 23 rd October 2013.

Slide 28 Copyright © 2013 MarkLogic® Corporation. All rights reserved.

Returning Triples

describe :person4

construct { ?bp :uses-name ?fn} where { ?person :birth-place ?bp; :first-name ?fn}

Page 28: An RDF and XML Database John Snelson, Lead Engineer 23 rd October 2013.

Slide 29 Copyright © 2013 MarkLogic® Corporation. All rights reserved.

Triple Resultssem:triple

:place0 :uses-name “Ethel”, “Jeffrey”, “Kara” .:place1 :uses-name “Edward”, “James” .:place10 :uses-name “Robert”, “Sheila”, “Stephen” .


Page 29: An RDF and XML Database John Snelson, Lead Engineer 23 rd October 2013.

Slide 30 Copyright © 2013 MarkLogic® Corporation. All rights reserved.

Querying Named Graphs

select *from <http://my_graph>where { ?s ?p ?o }

select * where { graph <http://my_graph> { ?s ?p ?o }}


Page 30: An RDF and XML Database John Snelson, Lead Engineer 23 rd October 2013.

Slide 31 Copyright © 2013 MarkLogic® Corporation. All rights reserved.

Restricting The Datasets

let $options := “properties”let $query := cts:and-query( cts:directory-query(“/triples/”), cts:element-range-query( xs:QName(“date”),“>”,$date) )return sem:sparql(“…”,(),(), $options,$query)

Page 31: An RDF and XML Database John Snelson, Lead Engineer 23 rd October 2013.

Slide 32 Copyright © 2013 MarkLogic® Corporation. All rights reserved.

Creating Triples

• sem:triple()• sem:rdf-parse()• sem:rdf-get()• sem:rdf-builder()

• sem:rdf-load()• sem:rdf-insert()

Returning sem:triple values

Inserting to a database

Page 32: An RDF and XML Database John Snelson, Lead Engineer 23 rd October 2013.

Slide 33 Copyright © 2013 MarkLogic® Corporation. All rights reserved.

Graph Store API

declare function graph-insert(

$graphname as sem:iri,

$triples as sem:triple*,

[$permissions as element(sec:permission)*,

$collections as xs:string*,

$quality as xs:int?,

$forest-ids as xs:unsignedLong*]

) as xs:string*;

declare function graph-delete(

$graphname as sem:iri

) as empty-sequence();

Page 33: An RDF and XML Database John Snelson, Lead Engineer 23 rd October 2013.

Slide 34 Copyright © 2013 MarkLogic® Corporation. All rights reserved.


• Semantics can enhance your data-oriented and search applications.• XQuery and SPARQL work well together.• A combination RDF and XML database simplifies working with the technologies together.• Try MarkLogic 7:

Page 34: An RDF and XML Database John Snelson, Lead Engineer 23 rd October 2013.

Slide 35 Copyright © 2013 MarkLogic® Corporation. All rights reserved.

Any Questions?