‹#› Het begint met een idee
BRIDGING THE GAP BETWEEN
RESTFUL APIS AND LINKED DATA
Albert Meroño-Peñuela
Rinke Hoekstra
& many others
CLARIAH Tech Day
07-10-2016
Vrije Universiteit Amsterdam
2
ACCESSING LINKED DATA
Vrije Universiteit Amsterdam
 Multiple Linked Data consuming applications
 Variety of access interfaces needed
3
ACCESSING LINKED DATA
‹#› Het begint met een idee4
‹#› Het begint met een idee
5 Het begint met een idee
 One .rq file for SPARQL query
 Good support of query curation
processes
> Versioning
> Branching
> Clone-pull-push
 Web-friendly features!
> One URI per query
> Uniquely identifiable
> De-referenceable
(raw.githubusercontent.com)
5 Faculty / department / title presentation
GITHUB AS A HUB OF
SPARQL QUERIES
‹#› Het begint met een idee
6 Het begint met een idee
Rinke: this is an asset in itself.
We need to be able to keep
the queries we use to answer
research questions  for
reproducibility
Vrije Universiteit Amsterdam
 Linked Data APIs emerge
 RESTful entry point to Linked Data hubs for Web applications
 OpenPHACTS
 …but the Linked Data API (e.g. Swagger spec, code itself) still
needs to be coded and maintained
7
MEANWHILE IN THE SEMANTIC WEB…
‹#› Het begint met een idee
8 Het begint met een idee
 Cousin of BASIL in a SALAD 
 Same basic principle: 1 SPARQL query = 1
API operation
 Automatically builds Swagger spec and UI
from SPARQL
But:
 External query management
 Organization of SPARQL queries in the
GitHub repo matches organization of the
API
 Thin layer – nothing stored server-side
 Maps
> GitHub API
> Swagger spec
Meroño & Hoekstra. ‘grlc Makes GitHub Taste Like
Linked Data APIs’. SALAD, ESWC (2016)
8 Faculty / department / title presentation
Vrije Universiteit Amsterdam
9
MAPPING GITHUB AND SWAGGER
Vrije Universiteit Amsterdam
10
SPARQL DECORATOR SYNTAX
Vrije Universiteit Amsterdam
11
THE GRLC SERVICE
 Assuming your repo is at https://github.com/:owner/:repo
and your grlc instance at :host,
> http://:host/api/:owner/:repo/spec returns the JSON swagger
spec
> http://:host/api/:owner/:repo/api-docs returns the swagger UI
> http://:host/api/:owner/:repo/:operation?p_1=v_1...p_n=v_n
calls operation with specifiec parameter values
> Uses BASIL’s SPARQL variable name convention for query parameters
 Sends requests to
> https://api.github.com/repos/:owner/:repo to look for SPARQL queries and their
decorators
> https://raw.githubusercontent.com/:owner/:repo/master/file.rq to dereference
queries, get the SPARQL, and parse it
Vrije Universiteit Amsterdam
12
DROPDOWNS
• Fills in the
swag[paths][op][method][parameters]
[enum] array
• Uses the de-contextualized triple
pattern of the SPARQL query’s BGP
against the same SPARQL endpoint
• Very inefficient
• JSON spec caching via reverse proxy
• LOD cache
• Own dimension/codelist cache
• Unmapped parameter ambiguity if
the user wants to mix enum with
arbitrary parameter values (“all
values”)
Vrije Universiteit Amsterdam
13
CONTENT NEGOTIATION
• API endpoints can now
end with .content_type
(e.g grlc.io/CLARIAH/wp-
queries/MyQuery.csv)
• Supports .csv, .json,
.html (can be extended)
• grlc sets ‘Accept’ HTTP
header and agnostically
returns same ‘Content-
Type’ as the SPARQL
endpoint
• Up to the SPARQL
endpoint to accept it
Vrije Universiteit Amsterdam
14
PAGINATION
• Large query results are
typically nasty to consuming
applications
• Split the result in multiple
parts (or “pages”)
• Size? #+ pagination: 100
• Navigating pages
• rel=next,prev,first,last links
in the HTTP headers (GitHub
API Traversal convention)
• Extra request parameter
?page (defaults to 1)
~ curl -X GET -H"Accept: text/csv" -I
http://localhost:8088/api/CEDAR-project/Queries/houseType_all
HTTP/1.0 200 OK
Content-Type: text/csv; charset=UTF-8
Content-Length: 18447
Server: grlc/1.0.0
Link: <http://localhost:8088/api/CEDAR-
project/Queries/houseType_all?page=2>; rel=next,
<http://localhost:8088/api/CEDAR-
project/Queries/houseType_all?page=889>; rel=last
~ curl -X GET -H"Accept: text/csv" -I
http://localhost:8088/api/CEDAR-
project/Queries/houseType_all?page=3
HTTP/1.0 200 OK
Content-Type: text/csv; charset=UTF-8
Content-Length: 18142
Server: grlc/1.0.0
Link: <http://localhost:8088/api/CEDAR-
project/Queries/houseType_all?page=4>; rel=next,
<http://localhost:8088/api/CEDAR-
project/Queries/houseType_all?page=2>; rel=prev,
<http://localhost:8088/api/CEDAR-
project/Queries/houseType_all?page=1>; rel=first,
<http://localhost:8088/api/CEDAR-
project/Queries/houseType_all?page=889>; rel=last
Vrije Universiteit Amsterdam
15
CACHE
• Moved implementation
outside of grlc (not its
direct responsibility)
• grlc sets HTTP header
Cache-Control to public,
max-age=900 (15 minutes,
customizable)
• nginx caches all grlc
generated JSON (and
other static/dynamic
assets)
• nginx becomes part of the
bundle
Vrije Universiteit Amsterdam
16
CONTAINER RELEASE
• Uses docker
• Infrastructure-
independent install
• Bundles (composes) all
required packages
(python, python libs, grlc,
nginx). Can be easily extended
to more
• Publicly available at
hub.docker.com
• One-command server deploy:
docker pull
clariah/grlc
Vrije Universiteit Amsterdam
The spectrum of Linked Data clients: SPARQL intensive applications
vs RESTful API applications
grlc uses decoupling of SPARQL from all client applications
(including LDA) as a powerful practice
 Separates query curation workflows from everything else
 Allows at the same time
> Web-friendly SPARQL queries
> Web-friendly RESTful APIs
 Helps you to easily organise your LDA – just organise your SPARQL
repository and you’re set
 Try it out!
> http://grlc.io/
> https://github.com/CLARIAH/grlc
17
CONCLUSIONS
Vrije Universiteit Amsterdam
Finish with the curl –X GET that gives the result of the
original query in the crappy script
‹#› Het begint met een idee
THANK YOU!
@ALBERTMERONYO
DATALEGEND.NET
CLARIAH.NL
19

grlc: Bridging the Gap Between RESTful APIs and Linked Data

  • 1.
    ‹#› Het begintmet een idee BRIDGING THE GAP BETWEEN RESTFUL APIS AND LINKED DATA Albert Meroño-Peñuela Rinke Hoekstra & many others CLARIAH Tech Day 07-10-2016
  • 2.
  • 3.
    Vrije Universiteit Amsterdam Multiple Linked Data consuming applications  Variety of access interfaces needed 3 ACCESSING LINKED DATA
  • 4.
    ‹#› Het begintmet een idee4
  • 5.
    ‹#› Het begintmet een idee 5 Het begint met een idee  One .rq file for SPARQL query  Good support of query curation processes > Versioning > Branching > Clone-pull-push  Web-friendly features! > One URI per query > Uniquely identifiable > De-referenceable (raw.githubusercontent.com) 5 Faculty / department / title presentation GITHUB AS A HUB OF SPARQL QUERIES
  • 6.
    ‹#› Het begintmet een idee 6 Het begint met een idee Rinke: this is an asset in itself. We need to be able to keep the queries we use to answer research questions  for reproducibility
  • 7.
    Vrije Universiteit Amsterdam Linked Data APIs emerge  RESTful entry point to Linked Data hubs for Web applications  OpenPHACTS  …but the Linked Data API (e.g. Swagger spec, code itself) still needs to be coded and maintained 7 MEANWHILE IN THE SEMANTIC WEB…
  • 8.
    ‹#› Het begintmet een idee 8 Het begint met een idee  Cousin of BASIL in a SALAD   Same basic principle: 1 SPARQL query = 1 API operation  Automatically builds Swagger spec and UI from SPARQL But:  External query management  Organization of SPARQL queries in the GitHub repo matches organization of the API  Thin layer – nothing stored server-side  Maps > GitHub API > Swagger spec Meroño & Hoekstra. ‘grlc Makes GitHub Taste Like Linked Data APIs’. SALAD, ESWC (2016) 8 Faculty / department / title presentation
  • 9.
  • 10.
  • 11.
    Vrije Universiteit Amsterdam 11 THEGRLC SERVICE  Assuming your repo is at https://github.com/:owner/:repo and your grlc instance at :host, > http://:host/api/:owner/:repo/spec returns the JSON swagger spec > http://:host/api/:owner/:repo/api-docs returns the swagger UI > http://:host/api/:owner/:repo/:operation?p_1=v_1...p_n=v_n calls operation with specifiec parameter values > Uses BASIL’s SPARQL variable name convention for query parameters  Sends requests to > https://api.github.com/repos/:owner/:repo to look for SPARQL queries and their decorators > https://raw.githubusercontent.com/:owner/:repo/master/file.rq to dereference queries, get the SPARQL, and parse it
  • 12.
    Vrije Universiteit Amsterdam 12 DROPDOWNS •Fills in the swag[paths][op][method][parameters] [enum] array • Uses the de-contextualized triple pattern of the SPARQL query’s BGP against the same SPARQL endpoint • Very inefficient • JSON spec caching via reverse proxy • LOD cache • Own dimension/codelist cache • Unmapped parameter ambiguity if the user wants to mix enum with arbitrary parameter values (“all values”)
  • 13.
    Vrije Universiteit Amsterdam 13 CONTENTNEGOTIATION • API endpoints can now end with .content_type (e.g grlc.io/CLARIAH/wp- queries/MyQuery.csv) • Supports .csv, .json, .html (can be extended) • grlc sets ‘Accept’ HTTP header and agnostically returns same ‘Content- Type’ as the SPARQL endpoint • Up to the SPARQL endpoint to accept it
  • 14.
    Vrije Universiteit Amsterdam 14 PAGINATION •Large query results are typically nasty to consuming applications • Split the result in multiple parts (or “pages”) • Size? #+ pagination: 100 • Navigating pages • rel=next,prev,first,last links in the HTTP headers (GitHub API Traversal convention) • Extra request parameter ?page (defaults to 1) ~ curl -X GET -H"Accept: text/csv" -I http://localhost:8088/api/CEDAR-project/Queries/houseType_all HTTP/1.0 200 OK Content-Type: text/csv; charset=UTF-8 Content-Length: 18447 Server: grlc/1.0.0 Link: <http://localhost:8088/api/CEDAR- project/Queries/houseType_all?page=2>; rel=next, <http://localhost:8088/api/CEDAR- project/Queries/houseType_all?page=889>; rel=last ~ curl -X GET -H"Accept: text/csv" -I http://localhost:8088/api/CEDAR- project/Queries/houseType_all?page=3 HTTP/1.0 200 OK Content-Type: text/csv; charset=UTF-8 Content-Length: 18142 Server: grlc/1.0.0 Link: <http://localhost:8088/api/CEDAR- project/Queries/houseType_all?page=4>; rel=next, <http://localhost:8088/api/CEDAR- project/Queries/houseType_all?page=2>; rel=prev, <http://localhost:8088/api/CEDAR- project/Queries/houseType_all?page=1>; rel=first, <http://localhost:8088/api/CEDAR- project/Queries/houseType_all?page=889>; rel=last
  • 15.
    Vrije Universiteit Amsterdam 15 CACHE •Moved implementation outside of grlc (not its direct responsibility) • grlc sets HTTP header Cache-Control to public, max-age=900 (15 minutes, customizable) • nginx caches all grlc generated JSON (and other static/dynamic assets) • nginx becomes part of the bundle
  • 16.
    Vrije Universiteit Amsterdam 16 CONTAINERRELEASE • Uses docker • Infrastructure- independent install • Bundles (composes) all required packages (python, python libs, grlc, nginx). Can be easily extended to more • Publicly available at hub.docker.com • One-command server deploy: docker pull clariah/grlc
  • 17.
    Vrije Universiteit Amsterdam Thespectrum of Linked Data clients: SPARQL intensive applications vs RESTful API applications grlc uses decoupling of SPARQL from all client applications (including LDA) as a powerful practice  Separates query curation workflows from everything else  Allows at the same time > Web-friendly SPARQL queries > Web-friendly RESTful APIs  Helps you to easily organise your LDA – just organise your SPARQL repository and you’re set  Try it out! > http://grlc.io/ > https://github.com/CLARIAH/grlc 17 CONCLUSIONS
  • 18.
    Vrije Universiteit Amsterdam Finishwith the curl –X GET that gives the result of the original query in the crappy script
  • 19.
    ‹#› Het begintmet een idee THANK YOU! @ALBERTMERONYO DATALEGEND.NET CLARIAH.NL 19

Editor's Notes

  • #2 Addresses that start with HTTP
  • #5 2 problems: (1) Technical knowledge (of SPARQL) is required; (2) Bad practice of hard-coding queries