Techniques used in RDF Data Publishing at Nature Publishing Group
-
Upload
tony-hammond -
Category
Technology
-
view
9.713 -
download
6
description
Transcript of Techniques used in RDF Data Publishing at Nature Publishing Group
Techniquesused in
RDF Data Publishingat
Nature Publishing Group
Tony HammondData Architect, NPG
March 5, 2013
22
Nature Publishing Group
● NPG a division of Macmillan (a privately owned company)
● Publishes ~120 titles in all● 34 Nature branded titles● 53 academic and society journals● 16 magazines (incl. Scientific American)
● ~1000 employees,17 offices (5 continents)● ~30 society partners● Databases, conferences/events, multimedia
33
Semantic Publishing at NPG
• Prior Work• RSS 1.0 webfeeds• HTML metadata• PDF metadata (XMP)• Urchin – RSS aggregator• OAI-PMH, OpenSearch (SRU), OpenURL
• Linked Data Apps• Public Data: test viability of data publishing• Hub: application of technology internally
44
Public Data
55
NPG by Numbers
66
NPG Ontology
77
Cloud Hosting
• TSO OpenUp® SaaS platform• Offers 5store as a triplestore• Scale-out architecture (C/C++)• Supports up to a trillion triples• 150,000tps load speed• SPARQL 1.0, with 1.1 features
(aggregates, etc)
88
data.nature.com
99
data.nature.com/query
1010
Hub
1111
Hub: Problem
1212
Hub: Solution
1313
Hub: Method
1414
XMP
1515
Building the Graph
1616
Local Hosting
• Apache TDB• Single-node architecture (Java)• Supports up to ~1.5b triples (tested)• SPARQL 1.1
1717
Data Publishing
1818
Hub Finder
1919
Hub Finder: Results
2020
Techniques
2121
Naming Architecture
2222
Naming Policy
Object Example Usage
Graph npgg:gadgets gadgets:33 ex:title "Title" npgg:gadgets .
Class npg:Gadget gadgets:33 a npg:Gadget npgg:gadgets .
Object Property
npg:hasGadget _:12 npg:hasGadget gadgets:33 npgg:_ .
Data Property
ex:title gadgets:33 ex:title "Title" npgg:gadgets .
Instance gadgets:33 gadgets:33 ex:title "Title" npgg:gadgets .
npg: http://ns.nature.com/terms/npgg: http://ns.nature.com/graphs/
2323
Publishing
2424
Monitoring
2525
ETL Process
2626
Datastore: Imports
2727
Datastore: Exports
2828
Contracts
npgg:affiliations a npg:Graph, void:Dataset ; dcterms:description "Graph of npg:Affiliation objects" ; dcterms:issued "2013-02-15"^^xsd:date ; dcterms:modified "2013-02-15"^^xsd:date ; dcterms:publisher [ a foaf:Organization ; foaf:mbox <mailto:[email protected]> ; foaf:name "Nature Publishing Group" ] ; dcterms:source "extractor-xml" ; dcterms:title "npgg:affiliations" ; rdfs:label "npgg:affiliations" ; void:classPartition [ void:class npg:Affiliation ; void:entities "973208"^^xsd:int ] ; void:propertyPartition [ void:property vcard:url ; void:triples "326"^^xsd:int ], [ void:property vcard:street-address ; void:triples "82638"^^xsd:int ], [
void:property vcard:region ; void:triples "183483"^^xsd:int ], [ void:property vcard:organisation-name ; void:triples "694290"^^xsd:int ], [ void:property vcard:locality ; void:triples "412042"^^xsd:int ], [ void:property vcard:email ; void:triples "21650"^^xsd:int ], [ void:property vcard:country-name ; void:triples 0 ], [ void:property rdfs:label ; void:triples "973208"^^xsd:int ], [ void:property rdf:type ; void:triples "973208"^^xsd:int ] ; void:triples "3340845"^^xsd:int ; void:vocabulary npg:, rdf:, rdfs:, void: .
2929
Linked Data API
• ./api/articles [.json, .rdf, .xml]• ./api/articles?hasProduct.pcode=ng• ./api/contributors?familyName=Smith• ./api/products.json?pcode=ng&_page=2• ./api/products?_view=none&_properties=pcode• ./api/search?title=black+hole• ./api/tree/subjects/children.xml?_sort=title
3030
Closing
3131
Positions Available
goo.gl/bYIt8www.linkedin.com/jobs?jobId=4890057&viewJob
3232
Information
data.nature.comdevelopers.nature.com/docs
datahub.io/group/npg
prefix.cc/npg