Download - grlc: Bridging the Gap Between RESTful APIs and Linked Data

1 Het begint met een idee

BRIDGING THE GAP BETWEEN RESTFUL APIS AND LINKED DATA

Albert Meroño-PeñuelaRinke Hoekstra& many others

CLARIAH Tech Day07-10-2016

Vrije Universiteit Amsterdam2

ACCESSING LINKED DATA

Vrije Universiteit Amsterdam

Multiple Linked Data consuming applications Variety of access interfaces needed

3

ACCESSING LINKED DATA

4 Het begint met een idee4

5 Het begint met een idee5 Het begint met een idee

One .rq file for SPARQL query Good support of query curation

processes> Versioning> Branching> Clone-pull-push

Web-friendly features!> One URI per query> Uniquely identifiable> De-referenceable

(raw.githubusercontent.com)

5 Faculty / department / title presentation

GITHUB AS A HUB OF SPARQL QUERIES


Rinke: this is an asset in itself. We need to be able to keep the queries we use to answer research questions for reproducibility


Linked Data APIs emerge RESTful entry point to Linked Data hubs for Web applications OpenPHACTS

…but the Linked Data API (e.g. Swagger spec, code itself) still needs to be coded and maintained

7

MEANWHILE IN THE SEMANTIC WEB…


Cousin of BASIL in a SALAD Same basic principle: 1 SPARQL query = 1

API operation Automatically builds Swagger spec and UI

from SPARQL

But: External query management Organization of SPARQL queries in the

GitHub repo matches organization of the API

Thin layer – nothing stored server-side Maps

> GitHub API> Swagger spec

Meroño & Hoekstra. ‘grlc Makes GitHub Taste Like Linked Data APIs’. SALAD, ESWC (2016)

8 Faculty / department / title presentation

Vrije Universiteit Amsterdam9

MAPPING GITHUB AND SWAGGER


10

SPARQL DECORATOR SYNTAX


11

THE GRLC SERVICE

Assuming your repo is at https://github.com/:owner/:repo and your grlc instance at :host,

> http://:host/api/:owner/:repo/spec returns the JSON swagger spec

> http://:host/api/:owner/:repo/api-docs returns the swagger UI> http://:host/api/:owner/:repo/:operation?p_1=v_1...p_n=v_n

calls operation with specifiec parameter values> Uses BASIL’s SPARQL variable name convention for query parameters

Sends requests to> https://api.github.com/repos/:owner/:repo to look for SPARQL queries and their

decorators> https://raw.githubusercontent.com/:owner/:repo/master/file.rq to dereference

queries, get the SPARQL, and parse it

https://github.com/:owner/:repo

http://localhost/

https://api.github.com/repos/:owner/:repo

https://api.github.com/repos/:owner/:repo

https://raw.githubusercontent.com/:owner/:repo/master/file.rq


12

DROPDOWNS

• Fills in the swag[paths][op][method][parameters][enum] array

• Uses the de-contextualized triple pattern of the SPARQL query’s BGP against the same SPARQL endpoint

• Very inefficient

• JSON spec caching via reverse proxy

• LOD cache

• Own dimension/codelist cache

• Unmapped parameter ambiguity if the user wants to mix enum with arbitrary parameter values (“all values”)


13

CONTENT NEGOTIATION

• API endpoints can now end with .content_type (e.g grlc.io/CLARIAH/wp-queries/MyQuery.csv)

• Supports .csv, .json, .html (can be extended)

• grlc sets ‘Accept’ HTTP header and agnostically returns same ‘Content-Type’ as the SPARQL endpoint

• Up to the SPARQL endpoint to accept it

http://grlc.io/CLARIAH/wp-queries/MyQuery.csv

http://grlc.io/CLARIAH/wp-queries/MyQuery.csv


14

PAGINATION

• Large query results are typically nasty to consuming applications

• Split the result in multiple parts (or “pages”)

• Size? #+ pagination: 100

• Navigating pages

• rel=next,prev,first,last links in the HTTP headers (GitHub API Traversal convention)

• Extra request parameter ?page (defaults to 1)

~ curl -X GET -H"Accept: text/csv" -I http://localhost:8088/api/CEDAR-project/Queries/houseType_all

HTTP/1.0 200 OKContent-Type: text/csv; charset=UTF-8Content-Length: 18447Server: grlc/1.0.0Link: <http://localhost:8088/api/CEDAR-project/Queries/houseType_all?page=2>; rel=next, <http://localhost:8088/api/CEDAR-project/Queries/houseType_all?page=889>; rel=last

~ curl -X GET -H"Accept: text/csv" -I http://localhost:8088/api/CEDAR-project/Queries/houseType_all?page=3

HTTP/1.0 200 OKContent-Type: text/csv; charset=UTF-8Content-Length: 18142Server: grlc/1.0.0Link: <http://localhost:8088/api/CEDAR-project/Queries/houseType_all?page=4>; rel=next, <http://localhost:8088/api/CEDAR-project/Queries/houseType_all?page=2>; rel=prev, <http://localhost:8088/api/CEDAR-project/Queries/houseType_all?page=1>; rel=first, <http://localhost:8088/api/CEDAR-project/Queries/houseType_all?page=889>; rel=last

http://localhost:8088/api/CEDAR-project/Queries/houseType_all


15

CACHE

• Moved implementation outside of grlc (not its direct responsibility)

• grlc sets HTTP header Cache-Control to public, max-age=900 (15 minutes, customizable)

• nginx caches all grlc generated JSON (and other static/dynamic assets)

• nginx becomes part of the bundle


16

CONTAINER RELEASE

• Uses docker

• Infrastructure-independent install

• Bundles (composes) all required packages (python, python libs, grlc, nginx). Can be easily extended to more

• Publicly available at hub.docker.com

• One-command server deploy: docker pull clariah/grlc

http://hub.docker.com/


The spectrum of Linked Data clients: SPARQL intensive applications vs RESTful API applications

grlc uses decoupling of SPARQL from all client applications (including LDA) as a powerful practice

Separates query curation workflows from everything else Allows at the same time

> Web-friendly SPARQL queries> Web-friendly RESTful APIs

Helps you to easily organise your LDA – just organise your SPARQL repository and you’re set

Try it out!> http://grlc.io/ > https://github.com/CLARIAH/grlc 17

CONCLUSIONS

http://grlc.io/

https://github.com/CLARIAH/grlc

https://github.com/CLARIAH/grlc


Finish with the curl –X GET that gives the result of the original query in the crappy script

19 Het begint met een idee

THANK YOU!

@ALBERTMERONYO

DATALEGEND.NETCLARIAH.NL

19