Techniques used in RDF Data Publishing at Nature Publishing Group

Post on 11-Jun-2015

9.713 views 6 download

Tags:

description

Lotico London Semweb Meetup - March 2013

Transcript of Techniques used in RDF Data Publishing at Nature Publishing Group

Techniquesused in

RDF Data Publishingat

Nature Publishing Group

Tony HammondData Architect, NPG

March 5, 2013

22

Nature Publishing Group

● NPG a division of Macmillan (a privately owned company)

● Publishes ~120 titles in all● 34 Nature branded titles● 53 academic and society journals● 16 magazines (incl. Scientific American)

● ~1000 employees,17 offices (5 continents)● ~30 society partners● Databases, conferences/events, multimedia

33

Semantic Publishing at NPG

• Prior Work• RSS 1.0 webfeeds• HTML metadata• PDF metadata (XMP)• Urchin – RSS aggregator• OAI-PMH, OpenSearch (SRU), OpenURL

• Linked Data Apps• Public Data: test viability of data publishing• Hub: application of technology internally

44

Public Data

55

NPG by Numbers

66

NPG Ontology

77

Cloud Hosting

• TSO OpenUp® SaaS platform• Offers 5store as a triplestore• Scale-out architecture (C/C++)• Supports up to a trillion triples• 150,000tps load speed• SPARQL 1.0, with 1.1 features

(aggregates, etc)

88

data.nature.com

99

data.nature.com/query

1010

Hub

1111

Hub: Problem

1212

Hub: Solution

1313

Hub: Method

1414

XMP

1515

Building the Graph

1616

Local Hosting

• Apache TDB• Single-node architecture (Java)• Supports up to ~1.5b triples (tested)• SPARQL 1.1

1717

Data Publishing

1818

Hub Finder

1919

Hub Finder: Results

2020

Techniques

2121

Naming Architecture

2222

Naming Policy

Object Example Usage

Graph npgg:gadgets gadgets:33 ex:title "Title" npgg:gadgets .

Class npg:Gadget gadgets:33 a npg:Gadget npgg:gadgets .

Object Property

npg:hasGadget _:12 npg:hasGadget gadgets:33 npgg:_ .

Data Property

ex:title gadgets:33 ex:title "Title" npgg:gadgets .

Instance gadgets:33 gadgets:33 ex:title "Title" npgg:gadgets .

npg: http://ns.nature.com/terms/npgg: http://ns.nature.com/graphs/

2323

Publishing

2424

Monitoring

2525

ETL Process

2626

Datastore: Imports

2727

Datastore: Exports

2828

Contracts

npgg:affiliations a npg:Graph, void:Dataset ; dcterms:description "Graph of npg:Affiliation objects" ; dcterms:issued "2013-02-15"^^xsd:date ; dcterms:modified "2013-02-15"^^xsd:date ; dcterms:publisher [ a foaf:Organization ; foaf:mbox <mailto:developers@nature.com> ; foaf:name "Nature Publishing Group" ] ; dcterms:source "extractor-xml" ; dcterms:title "npgg:affiliations" ; rdfs:label "npgg:affiliations" ; void:classPartition [ void:class npg:Affiliation ; void:entities "973208"^^xsd:int ] ; void:propertyPartition [ void:property vcard:url ; void:triples "326"^^xsd:int ], [ void:property vcard:street-address ; void:triples "82638"^^xsd:int ], [

void:property vcard:region ; void:triples "183483"^^xsd:int ], [ void:property vcard:organisation-name ; void:triples "694290"^^xsd:int ], [ void:property vcard:locality ; void:triples "412042"^^xsd:int ], [ void:property vcard:email ; void:triples "21650"^^xsd:int ], [ void:property vcard:country-name ; void:triples 0 ], [ void:property rdfs:label ; void:triples "973208"^^xsd:int ], [ void:property rdf:type ; void:triples "973208"^^xsd:int ] ; void:triples "3340845"^^xsd:int ; void:vocabulary npg:, rdf:, rdfs:, void: .

2929

Linked Data API

• ./api/articles [.json, .rdf, .xml]• ./api/articles?hasProduct.pcode=ng• ./api/contributors?familyName=Smith• ./api/products.json?pcode=ng&_page=2• ./api/products?_view=none&_properties=pcode• ./api/search?title=black+hole• ./api/tree/subjects/children.xml?_sort=title

3030

Closing

3131

Positions Available

goo.gl/bYIt8www.linkedin.com/jobs?jobId=4890057&viewJob