RDF Database-as-a-Service with S4
-
Upload
marin-dimitrov -
Category
Technology
-
view
1.409 -
download
1
Transcript of RDF Database-as-a-Service with S4
RDF Database-as-a-Service with S4
Marin Dimitrov, CTO of Ontotext
Apr 27th, 2015
RDF DBaaS with S4 / AKSW Colloquium #1 Apr 2015
• Self-Service Semantic Suite (S4)
• RDF DBaaS on AWS
• Demo
Contents
#2 RDF DBaaS with S4 / AKSW Colloquium Apr 2015
About Ontotext
• Provides products & solutions for content enrichment and metadata management
– 70 employees, headquarters in Sofia (Bulgaria)
– Sales presence in London, Washington & Boston
• Major clients and industries
– Media & Publishing
– Health Care & Life Sciences
– Cultural Heritage & Digital Libraries
– Government
– Education
#3 RDF DBaaS with S4 / AKSW Colloquium Apr 2015
The Self-Service Semantic Suite (S4)
#4 RDF DBaaS with S4 / AKSW Colloquium Apr 2015
• On-demand capabilities for text analytics, content enrichment and metadata management
– Text analytics for news, life sciences and social media
– RDF graph database as-a-service
– Access to large open knowledge graphs
• Available anytime, anywhere
– Simple RESTful services
• Simple, pay-per-use pricing
– No upfront commitments
What is S4?
#5 RDF DBaaS with S4 / AKSW Colloquium Apr 2015
• Enables quick prototyping
– Instantly available, no provisioning & operations required
– Focus on building applications, don’t worry about infrastructure
• Free tier
– Even bigger free quotas for research groups & projects
• Easy to start, shorter learning curve
– Various add-ons, SDKs and demo code
• Based on enterprise semantic technology by Ontotext
S4 benefits
#6 RDF DBaaS with S4 / AKSW Colloquium Apr 2015
• Text analytics services
– News annotation
– News categorisation
– Biomedical
• Entity linking & disambiguation
– Mappings to DBpedia & GeoNames instances
– Mappings to biomedical data sources (LinkedLifeData)
• HTML, MS Word, XML, plain text input
• Simple JSON output
Text analytics with S4
#7 RDF DBaaS with S4 / AKSW Colloquium Apr 2015
News analytics example
#8
S4 result
RDF DBaaS with S4 / AKSW Colloquium Apr 2015
• Available from AWS Marketplace
• Variety of hardware configurations
– 2 to 8 CPU cores / 8 to 61 GB RAM
– IOPS performance & encryption (EBS)
• Manage large data volumes
• Pay-per-hour pricing
Self-managed RDF DB in the Cloud
#9 RDF DBaaS with S4 / AKSW Colloquium Apr 2015
• Low-cost DBaaS available 24/7
• Ideal for small & moderate data volumes
• Instantly deploy new databases when needed
• Zero administration: automated operations, maintenance & upgrades
• Users pay only for the actual database utilisation
– Number of triples stored + number of queries per month
Fully managed RDF DB in the Cloud
#10 RDF DBaaS with S4 / AKSW Colloquium Apr 2015
• SPARQL query endpoint to the FactForge knowledge graph
– 500 million entities / 5 billion triples
• Key LOD datasets integrated
– DBpedia, Freebase, GeoNames, WordNet
– Dublin Core, SKOS, PROTON ontologies and vocabularies
Knowledge graphs with S4
#11 RDF DBaaS with S4 / AKSW Colloquium Apr 2015
• (available soon)
• Knowledge Graph bundles
– DBpedia, Wikidata, GeoNames, …
– GraphDB RDF database (self-managed @ AWS)
– 3rd party interactive data exploration tool (faceted search, data navigation, dynamic charts)
• Get instant & reliable access to KGs without dealing with provisioning, data import, maintenance, …
Knowledge graphs with S4
#12 RDF DBaaS with S4 / AKSW Colloquium Apr 2015
• Java & C# SDKs
• Sample code
– Java, C#, NodeJS, JavaScript, Python, PHP, Groovy
– Curl examples for the most impatient
• GATE & UIMA plugins
• Firefox & Chrome add-ons
• Online documentation
S4 for developers
#13 RDF DBaaS with S4 / AKSW Colloquium Apr 2015
• DaPaaS & ProDataMarket
– Goal: Open Data / Linked Data publishing & hosting
– S4 role: scalable Linked Data hosting infrastructure
• KConnect
– Goal: semantic annotation, search & analytics for healthcare data
– S4 role: scalable text analytics & RDF data management infrastructure
Research projects using S4
#14 RDF DBaaS with S4 / AKSW Colloquium Apr 2015
Fully Managed RDF Database-as-a-Service
#15 RDF DBaaS with S4 / AKSW Colloquium Apr 2015
• Elastic
– dynamically adapt to data & query volumes
• High availability & resilience
– no SPFs, “graceful degradation” of performance upon failures
• Cost efficient
– cost aware architecture
– Key aspect for Open Data scenarios like DaPaaS & ProDataMarket
• Isolation of the multi-tenant databases
• Fair use of shared resources
Requirements
#16 RDF DBaaS with S4 / AKSW Colloquium Apr 2015
• Micro DB
– Up to 1M triples
– FREE, available now
• Extra Small DB (10M triples)
• Small DB (50M)
• Medium DB (250M)
• Large DB (1B)
RDF DBaaS options on S4
#17 RDF DBaaS with S4 / AKSW Colloquium Apr 2015
• AWS based
– Storage, compute, load balancing, integration services…
• Ontotext GraphDB for the database instances
• OpenRDF REST services
• Docker for containerisation
• Network-attached volumes (EBS) for data storage
• A DBaaS on S4 is…
– A GraphDB instance
– Running within a Docker container
– With a private EBS data volume
Implementation
#18 RDF DBaaS with S4 / AKSW Colloquium Apr 2015
• Routing nodes
– Expose OpenRDF RESTful services to apps
– Access control & quota checks
– Forward client requests to the proper data node
– Temporarily queue requests when necessary
• Data nodes
– Multiple Docker containers (GDB+EBS) per node
• Coordinator (single)
– Distribute DB initialisation / creation tasks to data nodes
• Management Console
S4 DBaaS architecture
#19 RDF DBaaS with S4 / AKSW Colloquium Apr 2015
S4 DBaaS architecture
#20 RDF DBaaS with S4 / AKSW Colloquium Apr 2015
REST apps
3rd party RDF tools
Qu
ota
& A
cce
ss C
on
trol
routers
data nodes
coordinator
EBS
backups
SNS
Docker Repository
Account management
Quota management
reporting
Monitoring & Logging Dynamo
Amazon S3
images
• CRUD
– Router node receives a request
– Routes it to the proper data node & container
– Receives a response, forwards it back to client app
• Routing updates
– Data nodes push notification via SNS – “hearbeats” + changes regarding the hosted DBs (if any)
– Each routing node receives the notifications (via SNS) and updates its routing tables
– Coordinator also receives notifications, learns which DBs are operational / down for maintenance
Normal operations
#21 RDF DBaaS with S4 / AKSW Colloquium Apr 2015
Failure case #1 – data node crash
#22 RDF DBaaS with S4 / AKSW Colloquium Apr 2015
REST apps
3rd party RDF tools
Qu
ota
& A
cce
ss C
on
trol
routers
data nodes
coordinator
EBS SNS
Docker Repository
1 2
2
2
3
Recovery from a data node crash
#23 RDF DBaaS with S4 / AKSW Colloquium Apr 2015
REST apps
3rd party RDF Visualisation
Qu
ota
& A
cce
ss C
on
trol
routers
data nodes
Coordinator
EBS SNS
Docker Repository
1
2
3+4
5 6
6
6
7
Auto Scaling
Failure case #2 – router crash & recovery
#24 RDF DBaaS with S4 / AKSW Colloquium Apr 2015
REST apps
3rd party RDF tools
Qu
ota
& A
cce
ss C
on
trol
routers
data nodes
coordinator
EBS SNS
Docker Repository
1 3
Auto Scaling
4
5 6
7
8 2
• (open connections from client apps to the node are terminated)
• Auto-scaler starts a new router node
– New router subscribes to SNS for heartbeats & updates
• Load balancer starts sending new client requests to router
– Router puts them in the local queue (if routing table is still incomplete)
• Heartbeats from data nodes are received
– Routing information is now complete
– Router starts sending the queued requests to data nodes
Recovery from a router crash
#25 RDF DBaaS with S4 / AKSW Colloquium Apr 2015
Failure case #3 – coordinator crash & recovery
#26 RDF DBaaS with S4 / AKSW Colloquium Apr 2015
REST apps
3rd party RDF tools
Qu
ota
& A
cce
ss C
on
trol
routers
data nodes
coordinator
EBS SNS
Docker Repository
2
Auto Scaling
4
5
6
6
3
Create DB 1
• Routers can route requests to data nodes as usual
– … but new DBs cannot be created temporarily
– … and data nodes with free container slots can’t get info on DBs waiting for initialisation
• AWS Auto-scaler starts a new Coordinator node
– Coordinator reads a list of all registered DBs from the metadata store & subscribes to SNS
• Coordinator starts receiving heartbeats & updates from data nodes
– … learns which DBs are operational / pending
– … and resumes distributing new / pending DBs initialisation tasks to the data nodes with free slots
Failure case #3 – coordinator crash & recovery
#27 RDF DBaaS with S4 / AKSW Colloquium Apr 2015
• Combination of coordinator + data node + routing node crash – same as #1 + #2 + #3
• Routers depend on data nodes
• Data nodes depend on Coordinator
• Coordinator does not depend on other nodes
– No heartbeats coming, means all DBs are down
– Start distributing DB initialisation tasks whenever a request comes from a working data node
– Eventually, all data nodes are up, DBs initialised, heartbeats & routing updates start coming
– … and routers can start routing client requests
Composite failure & recovery
#28 RDF DBaaS with S4 / AKSW Colloquium Apr 2015
Management interface
#29 RDF DBaaS with S4 / AKSW Colloquium Apr 2015
Micro, XS, S, M, or L
I/O performance
R/O access to Open Data services or open knowledge
graphs
Management interface
#30 RDF DBaaS with S4 / AKSW Colloquium Apr 2015
DBaaS endpoint
DB details summary
Backup, export, change settings, delete
Run a test query
• Gradually introduce XS, S, M and L instances
• Integration with the GraphDB Workbench management UI
• LDF based containers
• Multi-datacenter deployment
• Replication across datacenters (single master)
Roadmap
#31 RDF DBaaS with S4 / AKSW Colloquium Apr 2015
• “On-demand Text Analytics and Metadata Management with S4” (ESaaSA @ CLOSER’2015)
• “Text Analytics and Linked Data Management As-a-Service with S4” (Wasabi @ ESWC’2015)
• “Low-cost Open Data As-a-Service in the Cloud” (SemDev @ ESWC’2015)
More Details
#32 RDF DBaaS with S4 / AKSW Colloquium Apr 2015
Demo
#33 RDF DBaaS with S4 / AKSW Colloquium Apr 2015
• (create an account & generate an API key pair)
• Create a new DB
• Create a new repository in the DB
– via the REST API / OpenRDF Java SDK / curl
– …or via UI tools like the OpenRDF Workbench
• Import sample data (REST / OpenRDF Workbench)
• Run a query through the public SPARQL endpoint
Demo scenario
#34 RDF DBaaS with S4 / AKSW Colloquium Apr 2015
Demo data – Universities in Saxony
#35 RDF DBaaS with S4 / AKSW Colloquium Apr 2015
#1 Create a database
#36 RDF DBaaS with S4 / AKSW Colloquium Apr 2015
#2a Create a repository & load data (curl)
#37 RDF DBaaS with S4 / AKSW Colloquium Apr 2015
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>. @prefix rep: <http://www.openrdf.org/config/repository#>. @prefix sr: <http://www.openrdf.org/config/repository/sail#>. @prefix sail: <http://www.openrdf.org/config/sail#>. @prefix graphdb: <http://www.ontotext.com/trree/owlim#>. [] a rep:Repository ; rep:repositoryID “test01" ; rdfs:label "Description of my repository" ; rep:repositoryImpl [ rep:repositoryType "openrdf:SailRepository" ; sr:sailImpl [ graphdb:ruleset "owl-horst-optimized" ; sail:sailType "owlim:Sail" ; graphdb:base-URL "http://example.org/graphdb#" ; graphdb:repository-type "file-repository" ; ] ].
Repository configuration file
config.ttl
• Repository name: ”test01” • OWL-Horst reasoning ruleset
#2a Create a repository & load data (curl)
#38 RDF DBaaS with S4 / AKSW Colloquium Apr 2015
API_KEY=… KEY_SECRET=… USER=… DATABASE=… REPOSITORY=… SERVICE_ENDPOINT="https://$API_KEY:[email protected]/$USER/$DATABASE" curl -X POST -H “Content-Type:application/x-turtle” -T config.ttl $SERVICE_ENDPOINT/repositories/SYSTEM/rdf-graphs/service?graph=http://example.com#g1 curl -X POST -H “Content-Type:application/x-turtle” -d “<http://example.com#g1> a <http://www.openrdf.org/config/repository#RepositoryContext>.” $SERVICE_ENDPOINT/repositories/SYSTEM/statements curl -X POST -H "Content-Type:application/rdf+xml;charset=UTF-8" -T example.rdf $SERVICE_ENDPOINT/repositories/$REPOSITORY/statements
Create a repository
Upload sample data from example.rdf
• User: 4730361296 • Database: demo01 • Repository: test01
• Configuration: config.ttl
#2b Create a repository & load data (OpenRDF Workbench)
#39 RDF DBaaS with S4 / AKSW Colloquium Apr 2015
DBaaS endpoint
#2b Create a repository & load data (OpenRDF Workbench)
#40 RDF DBaaS with S4 / AKSW Colloquium Apr 2015
#2b Create a repository & load data (OpenRDF Workbench)
#41 RDF DBaaS with S4 / AKSW Colloquium Apr 2015
DBaaS endpoint
#2b Create a repository & load data (OpenRDF Workbench)
#42 RDF DBaaS with S4 / AKSW Colloquium Apr 2015
#2b Create a repository & load data (OpenRDF Workbench)
#43 RDF DBaaS with S4 / AKSW Colloquium Apr 2015
#3a SPARQL query (OpenRDF Workbench)
#44 RDF DBaaS with S4 / AKSW Colloquium Apr 2015
#3a SPARQL query (OpenRDF Workbench)
#45 RDF DBaaS with S4 / AKSW Colloquium Apr 2015
#3b SPARQL query (from the S4 Management Console)
#46 RDF DBaaS with S4 / AKSW Colloquium Apr 2015
PREFIX dbpedia: <http://dbpedia.org/resource/> PREFIX dbp-prop: <http://dbpedia.org/property/> PREFIX dbp-ont: <http://dbpedia.org/ontology/> SELECT ?name ?numberOfStudents ?staff ?established WHERE { dbpedia:University_of_Leipzig rdfs:label ?name ; dbp-prop:students ?numberOfStudents ; dbp-prop:staff ?staff ; dbp-prop:established ?established . }
• S4 provides an enterprise RDF DBaaS
• Resilient design, high availability
• Instantly available whenever needed, easy to use, OpenRDF REST services
• Zero administration: automated operations, maintenance & upgrades
• Free DBs up to 1M triples (even more for research teams & projects)
• Check out http://s4.ontotext.com
Key takeaways
#47 RDF DBaaS with S4 / AKSW Colloquium Apr 2015
Thank you!
#48 RDF DBaaS with S4 / AKSW Colloquium Apr 2015