Scaling out federated queries for Life Sciences Data In Production

SCALING OUT FEDERATED QUERIES

FOR LIFE SCIENCES DATA IN PRODUCTION

Dieter De Witte, Laurens De Vocht, et al.

dieter.dewitte@ugent.be

• IMEC– IDLAB – GHENT UNIVERSITY

• ONTOFORCE

Catch 22!?

A. No Semantic Web Applications

because no Semantic Data

B. No Semantic Data

because no applications

A. The LOD Cloud for Life Sciences...

Ontoforce’s

DISQOVER

covers

Life Sciences Datasets

B. DISQOVER is an Exploratory Semantic Search UI (faceted browsing)

To Click = To SPARQL

The missing link in our catch 22? “How to run federated queries?”

Direct ETL

• Cloud Instances

• PAGO amis:

Scientific Benchmark = Reproducible Benchmark

Benchmark Client

• 1 single-threaded warm-up run (all 1,223 queries)

• 1 multi-threaded (8) run

• (8 x randomized order)

Database Node(s)

How to evaluate an RDF Database solution? Performance (

Data store,

Dataset,

Configuration,

Number of nodes,

Hardware (RAM)

Performance (

NoSQL Triple stores,

Watdiv 10M, 100M, 1000M,

Standard Configs,

Single Node,

32 GB RAM

SIGMOD 2016: Single Node SOTA on artificial data

More data, more problems

Timeout

Query performance: Virtuoso Leads, Blazegraph follows

Timeout

SWAT4LS 2016: Multi-node SOTA on real data

Performance (

Scale out systems,

DISQOVER data,

Optimized Configs,

Multi-Node, Compression

64 GB RAM

How to deal with Big Linked Data?

1. Vertical Scaling: bigger box

2. Compression: smaller content

3. Horizontal Scaling: more boxes, 1 location

4. Federation: more boxes, more locations

V1, Bla1 (single node Virtuoso, Blazegraph)

V1_32 (32GB Virtuoso)

Fu1 (Fuseki + HDT)

V3 (Virtuoso cluster 3 nodes)

Fl3 (FluidOps, aka FedX)

DISQOVER dataset ...

and queries

Count, Union, Sort, Aggregations

Example Query 1: Nesting, FILTERs, unbound triples

Example Query 4: Aggregations, Optionals

Initial performance results were counter-intuitive... and incorrect!!!

Worse hardware, better performance?

Only Virtuoso-backed systems survive multi-threaded benchmark

marks last successful query (no timeout)

1 x 1,223 queries 8 x 1,223 queries

No errors but incorrect #results!!!

FILTERs, UNIONs are challenging but ORDER + GROUP + OPTIONAL dominate

COUNT DISTINCT

600 – 1,223 BGPs

Conclusions & Future Work

• Additional diagnostics for RDF solutions!

• Extend benchmarking software with query correctness assessment!

• Multi-node RDF solutions???

• Towards Full paper:

– NoSQL for Ontoforce Data

– Scale out approaches for Watdiv + test LDF

– Release reusable end-to-end benchmark software:

• Setup AND Postprocessing

Thanks for your attention!!

SCALING OUT FEDERATED QUERIES

FOR LIFE SCIENCES DATA IN PRODUCTION

Dieter De Witte, Laurens De Vocht, et al.

contact: dieter.dewitte@ugent.be

slideshare:

• IMEC– IDLAB – GHENT UNIVERSITY

• ONTOFORCE

Scaling out federated queries for Life Sciences Data In Production

Data & Analytics

Transcript of Scaling out federated queries for Life Sciences Data In Production

Strategies for executing federated queries in SPARQL1 · Strategies for executing federated queries in SPARQL1.1 ... Fuseki) Hash Join (used in SIHJoin) How federated queries are

SC06 – Powerful Beyond Imagination Tampa, FL Nov 14, 2006 Scaling TeraGrid Access: A Roadmap (Testbed) for Federated Identity Management for a Large Cyberinfrastructure.

FEDERATED GOVERNANCE

The Odyssey Approach for Optimizing Federated …...The Odyssey Approach for Optimizing Federated SPARQL Queries Gabriela Montoya1, Hala Skaf-Molli2, and Katja Hose1 1 Aalborg University,

Faster SPARQL Federated Queries - Inria

Scaling to Millions of Concurrent SPARQL Queries on the Cloud

Federated Learning for Internet of Things: A Federated ...

Divide and Conquer: Challenges in Scaling Federated Search Presented by Abe Lederman, President and CTO Deep Web Technologies, LLC SearchEngine Meeting.

Federated Queries with HAWQ - SQL on Hadoop and Beyond

ABSTRACT - wuss18.orgwuss18.org/wp-content/uploads/accepted2018/16_Final_Paper_PD… · Web viewFederated Queries. Federated query is a unique feature offered by the FedSQL procedure.

Federated ontology-based queries over cancer data | BMC ...

Optimization of Continuous Queries in Federated Database ...

Scaling Hardware Accelerated Network Monitoring to ...ing entire queries to the PFE, *Flow places parts of the select and grouping logic that are common to all queries into a match+action

Hybrid / Federated

SCALING PATTERN AND SEQUENCE QUERIES IN ...dilum.bandara.lk/wp-content/uploads/2018/08/Thesis-Mohan...SCALING PATTERN AND SEQUENCE QUERIES IN COMPLEX EVENT PROCESSING Mohanadarshan

Federated Identity Managementeudat.eu/sites/default/files/Federated Identity... · 2019-04-01 · Federated Identity Management • “Federated Identity Management (FIM) is about

Keystone: Federated

Resource auto-scaling for SQL-like queries in the cloud ...

Oracle Big Data SQL - dataznalosti.cz · •Performance of BI tools like Cognos, Oracle BIEE, SAS, Tableau for large and complex federated queries is limited •Possible solution

Federated Government Reserves Fund · Federated Government Reserves Fund ... Federated Institutional Money Market Management Federated Institutional Prime Obligations Fund ... The