1 Het begint met een idee
BRIDGING THE GAP BETWEEN RESTFUL APIS AND LINKED DATA
Albert Meroño-PeñuelaRinke Hoekstra& many others
CLARIAH Tech Day07-10-2016
Vrije Universiteit Amsterdam2
ACCESSING LINKED DATA
Vrije Universiteit Amsterdam
Multiple Linked Data consuming applications Variety of access interfaces needed
3
ACCESSING LINKED DATA
4 Het begint met een idee4
5 Het begint met een idee5 Het begint met een idee
One .rq file for SPARQL query Good support of query curation
processes> Versioning> Branching> Clone-pull-push
Web-friendly features!> One URI per query> Uniquely identifiable> De-referenceable
(raw.githubusercontent.com)
5 Faculty / department / title presentation
GITHUB AS A HUB OF SPARQL QUERIES
6 Het begint met een idee6 Het begint met een idee
Rinke: this is an asset in itself. We need to be able to keep the queries we use to answer research questions for reproducibility
Vrije Universiteit Amsterdam
Linked Data APIs emerge RESTful entry point to Linked Data hubs for Web applications OpenPHACTS
…but the Linked Data API (e.g. Swagger spec, code itself) still needs to be coded and maintained
7
MEANWHILE IN THE SEMANTIC WEB…
8 Het begint met een idee8 Het begint met een idee
Cousin of BASIL in a SALAD Same basic principle: 1 SPARQL query = 1
API operation Automatically builds Swagger spec and UI
from SPARQL
But: External query management Organization of SPARQL queries in the
GitHub repo matches organization of the API
Thin layer – nothing stored server-side Maps
> GitHub API> Swagger spec
Meroño & Hoekstra. ‘grlc Makes GitHub Taste Like Linked Data APIs’. SALAD, ESWC (2016)
8 Faculty / department / title presentation
Vrije Universiteit Amsterdam9
MAPPING GITHUB AND SWAGGER
Vrije Universiteit Amsterdam
10
SPARQL DECORATOR SYNTAX
Vrije Universiteit Amsterdam
11
THE GRLC SERVICE
Assuming your repo is at https://github.com/:owner/:repo and your grlc instance at :host,
> http://:host/api/:owner/:repo/spec returns the JSON swagger spec
> http://:host/api/:owner/:repo/api-docs returns the swagger UI> http://:host/api/:owner/:repo/:operation?p_1=v_1...p_n=v_n
calls operation with specifiec parameter values> Uses BASIL’s SPARQL variable name convention for query parameters
Sends requests to> https://api.github.com/repos/:owner/:repo to look for SPARQL queries and their
decorators> https://raw.githubusercontent.com/:owner/:repo/master/file.rq to dereference
queries, get the SPARQL, and parse it
Vrije Universiteit Amsterdam
12
DROPDOWNS
• Fills in the swag[paths][op][method][parameters][enum] array
• Uses the de-contextualized triple pattern of the SPARQL query’s BGP against the same SPARQL endpoint
• Very inefficient
• JSON spec caching via reverse proxy
• LOD cache
• Own dimension/codelist cache
• Unmapped parameter ambiguity if the user wants to mix enum with arbitrary parameter values (“all values”)
Vrije Universiteit Amsterdam
13
CONTENT NEGOTIATION
• API endpoints can now end with .content_type (e.g grlc.io/CLARIAH/wp-queries/MyQuery.csv)
• Supports .csv, .json, .html (can be extended)
• grlc sets ‘Accept’ HTTP header and agnostically returns same ‘Content-Type’ as the SPARQL endpoint
• Up to the SPARQL endpoint to accept it
Vrije Universiteit Amsterdam
14
PAGINATION
• Large query results are typically nasty to consuming applications
• Split the result in multiple parts (or “pages”)
• Size? #+ pagination: 100
• Navigating pages
• rel=next,prev,first,last links in the HTTP headers (GitHub API Traversal convention)
• Extra request parameter ?page (defaults to 1)
~ curl -X GET -H"Accept: text/csv" -I http://localhost:8088/api/CEDAR-project/Queries/houseType_all
HTTP/1.0 200 OKContent-Type: text/csv; charset=UTF-8Content-Length: 18447Server: grlc/1.0.0Link: <http://localhost:8088/api/CEDAR-project/Queries/houseType_all?page=2>; rel=next, <http://localhost:8088/api/CEDAR-project/Queries/houseType_all?page=889>; rel=last
~ curl -X GET -H"Accept: text/csv" -I http://localhost:8088/api/CEDAR-project/Queries/houseType_all?page=3
HTTP/1.0 200 OKContent-Type: text/csv; charset=UTF-8Content-Length: 18142Server: grlc/1.0.0Link: <http://localhost:8088/api/CEDAR-project/Queries/houseType_all?page=4>; rel=next, <http://localhost:8088/api/CEDAR-project/Queries/houseType_all?page=2>; rel=prev, <http://localhost:8088/api/CEDAR-project/Queries/houseType_all?page=1>; rel=first, <http://localhost:8088/api/CEDAR-project/Queries/houseType_all?page=889>; rel=last
Vrije Universiteit Amsterdam
15
CACHE
• Moved implementation outside of grlc (not its direct responsibility)
• grlc sets HTTP header Cache-Control to public, max-age=900 (15 minutes, customizable)
• nginx caches all grlc generated JSON (and other static/dynamic assets)
• nginx becomes part of the bundle
Vrije Universiteit Amsterdam
16
CONTAINER RELEASE
• Uses docker
• Infrastructure-independent install
• Bundles (composes) all required packages (python, python libs, grlc, nginx). Can be easily extended to more
• Publicly available at hub.docker.com
• One-command server deploy: docker pull clariah/grlc
Vrije Universiteit Amsterdam
The spectrum of Linked Data clients: SPARQL intensive applications vs RESTful API applications
grlc uses decoupling of SPARQL from all client applications (including LDA) as a powerful practice
Separates query curation workflows from everything else Allows at the same time
> Web-friendly SPARQL queries> Web-friendly RESTful APIs
Helps you to easily organise your LDA – just organise your SPARQL repository and you’re set
Try it out!> http://grlc.io/ > https://github.com/CLARIAH/grlc 17
CONCLUSIONS
Vrije Universiteit Amsterdam
Finish with the curl –X GET that gives the result of the original query in the crappy script
19 Het begint met een idee
THANK YOU!
@ALBERTMERONYO
DATALEGEND.NETCLARIAH.NL
19
Top Related