Optimizing RDF Data Cubes for Efficient Processing of Analytical Queries

57
Aalborg University Optimizing RDF Data Cubes for Efficient Processing of Analytical Queries Kim Ahlstrøm Jakobsen Alex B. Andersen Katja Hose Torben Bach Pedersen Database Technology, Department of Computer Science, Aalborg University Kim Ahlstrøm Jakobsen Optimizing RDF Data Cubes 1 / 19

Transcript of Optimizing RDF Data Cubes for Efficient Processing of Analytical Queries

Page 1: Optimizing RDF Data Cubes for Efficient Processing of Analytical Queries

Aalborg University

Optimizing RDF Data Cubes for Efficient Processing ofAnalytical Queries

Kim Ahlstrøm JakobsenAlex B. Andersen

Katja HoseTorben Bach Pedersen

Database Technology,Department of Computer Science,

Aalborg University

Kim Ahlstrøm Jakobsen Optimizing RDF Data Cubes 1 / 19

Page 2: Optimizing RDF Data Cubes for Efficient Processing of Analytical Queries

Aalborg University

Motivation

Kim Ahlstrøm Jakobsen Optimizing RDF Data Cubes 2 / 19

Page 3: Optimizing RDF Data Cubes for Efficient Processing of Analytical Queries

Aalborg University

Motivation

Kim Ahlstrøm Jakobsen Optimizing RDF Data Cubes 2 / 19

Page 4: Optimizing RDF Data Cubes for Efficient Processing of Analytical Queries

Aalborg University

Motivation

Kim Ahlstrøm Jakobsen Optimizing RDF Data Cubes 2 / 19

Page 5: Optimizing RDF Data Cubes for Efficient Processing of Analytical Queries

Aalborg University

Motivation

Kim Ahlstrøm Jakobsen Optimizing RDF Data Cubes 2 / 19

Page 6: Optimizing RDF Data Cubes for Efficient Processing of Analytical Queries

Aalborg University

Motivation

Kim Ahlstrøm Jakobsen Optimizing RDF Data Cubes 2 / 19

Page 7: Optimizing RDF Data Cubes for Efficient Processing of Analytical Queries

Aalborg University

Motivation

Kim Ahlstrøm Jakobsen Optimizing RDF Data Cubes 2 / 19

Page 8: Optimizing RDF Data Cubes for Efficient Processing of Analytical Queries

Aalborg University

Future Goal

Goal

Analytical queries on internal data & external linked data

Benefits

Enables exploratory queries

Increasing amount of linked data

Integrates with heterogeneous data

Semantic reasoning

Kim Ahlstrøm Jakobsen Optimizing RDF Data Cubes 3 / 19

Page 9: Optimizing RDF Data Cubes for Efficient Processing of Analytical Queries

Aalborg University

Future Goal

Goal

Analytical queries on internal data & external linked data

Benefits

Enables exploratory queries

Increasing amount of linked data

Integrates with heterogeneous data

Semantic reasoning

Kim Ahlstrøm Jakobsen Optimizing RDF Data Cubes 3 / 19

Page 10: Optimizing RDF Data Cubes for Efficient Processing of Analytical Queries

Aalborg University

The First Steps

Efficient Processing of Analytical Querying on RDF Data Cubes.

Denormalize the cube dimensions

Reduce the subject-object joins (expensive)

Increase the subject-subject joins

Kim Ahlstrøm Jakobsen Optimizing RDF Data Cubes 4 / 19

Page 11: Optimizing RDF Data Cubes for Efficient Processing of Analytical Queries

Aalborg University

The First Steps

Efficient Processing of Analytical Querying on RDF Data Cubes.

Denormalize the cube dimensions

Reduce the subject-object joins (expensive)

Increase the subject-subject joins

Kim Ahlstrøm Jakobsen Optimizing RDF Data Cubes 4 / 19

Page 12: Optimizing RDF Data Cubes for Efficient Processing of Analytical Queries

Aalborg University

Workflow

Internal optimization

Kim Ahlstrøm Jakobsen Optimizing RDF Data Cubes 5 / 19

Page 13: Optimizing RDF Data Cubes for Efficient Processing of Analytical Queries

Aalborg University

Workflow

Internal optimization

Kim Ahlstrøm Jakobsen Optimizing RDF Data Cubes 5 / 19

Page 14: Optimizing RDF Data Cubes for Efficient Processing of Analytical Queries

Aalborg University

Workflow

Internal optimization

Kim Ahlstrøm Jakobsen Optimizing RDF Data Cubes 5 / 19

Page 15: Optimizing RDF Data Cubes for Efficient Processing of Analytical Queries

Aalborg University

Workflow

Internal optimization

Kim Ahlstrøm Jakobsen Optimizing RDF Data Cubes 5 / 19

Page 16: Optimizing RDF Data Cubes for Efficient Processing of Analytical Queries

Aalborg University

Workflow

Internal optimization

Kim Ahlstrøm Jakobsen Optimizing RDF Data Cubes 5 / 19

Page 17: Optimizing RDF Data Cubes for Efficient Processing of Analytical Queries

Aalborg University

Workflow

Internal optimization

Kim Ahlstrøm Jakobsen Optimizing RDF Data Cubes 5 / 19

Page 18: Optimizing RDF Data Cubes for Efficient Processing of Analytical Queries

Aalborg University

Workflow

Internal optimization

Kim Ahlstrøm Jakobsen Optimizing RDF Data Cubes 5 / 19

Page 19: Optimizing RDF Data Cubes for Efficient Processing of Analytical Queries

Aalborg University

Workflow

Internal optimization

Kim Ahlstrøm Jakobsen Optimizing RDF Data Cubes 5 / 19

Page 20: Optimizing RDF Data Cubes for Efficient Processing of Analytical Queries

Aalborg University

Workflow

Internal optimization

Kim Ahlstrøm Jakobsen Optimizing RDF Data Cubes 5 / 19

Page 21: Optimizing RDF Data Cubes for Efficient Processing of Analytical Queries

Aalborg University

Building the Cube

Purpose

Organize data with purpose ofanalysis

Easier to understand

What is a cube

Facts: The subject of the analysis

Dimensions: Perspectives of the data

Levels: Concepts in the dimensions

Kim Ahlstrøm Jakobsen Optimizing RDF Data Cubes 6 / 19

Page 22: Optimizing RDF Data Cubes for Efficient Processing of Analytical Queries

Aalborg University

Building the Cube

Purpose

Organize data with purpose ofanalysis

Easier to understand

What is a cube

Facts: The subject of the analysis

Dimensions: Perspectives of the data

Levels: Concepts in the dimensions

Kim Ahlstrøm Jakobsen Optimizing RDF Data Cubes 6 / 19

Page 23: Optimizing RDF Data Cubes for Efficient Processing of Analytical Queries

Aalborg University

Analytical Queries

Example Query 1

What is the revenue per country?

Example Query 2

What are the top k products bought by customers from Denmark?

Kim Ahlstrøm Jakobsen Optimizing RDF Data Cubes 7 / 19

Page 24: Optimizing RDF Data Cubes for Efficient Processing of Analytical Queries

Aalborg University

Analytical Queries

Example Query 1

What is the revenue per country?

Example Query 2

What are the top k products bought by customers from Denmark?

Kim Ahlstrøm Jakobsen Optimizing RDF Data Cubes 7 / 19

Page 25: Optimizing RDF Data Cubes for Efficient Processing of Analytical Queries

Aalborg University

PatternsSnowflake Pattern

Kim Ahlstrøm Jakobsen Optimizing RDF Data Cubes 8 / 19

Page 26: Optimizing RDF Data Cubes for Efficient Processing of Analytical Queries

Aalborg University

PatternsStar Pattern

Kim Ahlstrøm Jakobsen Optimizing RDF Data Cubes 9 / 19

Page 27: Optimizing RDF Data Cubes for Efficient Processing of Analytical Queries

Aalborg University

PatternsFully Denormalized Pattern

Kim Ahlstrøm Jakobsen Optimizing RDF Data Cubes 10 / 19

Page 28: Optimizing RDF Data Cubes for Efficient Processing of Analytical Queries

Aalborg University

Special Cases:Unbalanced Hierarchies

Kim Ahlstrøm Jakobsen Optimizing RDF Data Cubes 11 / 19

Page 29: Optimizing RDF Data Cubes for Efficient Processing of Analytical Queries

Aalborg University

Special Cases:Property Collision

Kim Ahlstrøm Jakobsen Optimizing RDF Data Cubes 12 / 19

Page 30: Optimizing RDF Data Cubes for Efficient Processing of Analytical Queries

Aalborg University

Special Cases:Property Collision

Kim Ahlstrøm Jakobsen Optimizing RDF Data Cubes 12 / 19

Page 31: Optimizing RDF Data Cubes for Efficient Processing of Analytical Queries

Aalborg University

Special Cases:Property Collision

Kim Ahlstrøm Jakobsen Optimizing RDF Data Cubes 12 / 19

Page 32: Optimizing RDF Data Cubes for Efficient Processing of Analytical Queries

Aalborg University

Semantic Web OLAP Denormalization Algorithm

Input

QB4OLAP ontology

Snowflake pattern RDF datacube

Output

Star pattern RDF data cube

Fully Denormalized pattern RDFdata cube

Features

Top-down traversal

Property renaming

Kim Ahlstrøm Jakobsen Optimizing RDF Data Cubes 13 / 19

Page 33: Optimizing RDF Data Cubes for Efficient Processing of Analytical Queries

Aalborg University

Unbalanced Hierarchies Example

Kim Ahlstrøm Jakobsen Optimizing RDF Data Cubes 14 / 19

Page 34: Optimizing RDF Data Cubes for Efficient Processing of Analytical Queries

Aalborg University

Unbalanced Hierarchies Example

Kim Ahlstrøm Jakobsen Optimizing RDF Data Cubes 14 / 19

Page 35: Optimizing RDF Data Cubes for Efficient Processing of Analytical Queries

Aalborg University

Unbalanced Hierarchies Example

Kim Ahlstrøm Jakobsen Optimizing RDF Data Cubes 14 / 19

Page 36: Optimizing RDF Data Cubes for Efficient Processing of Analytical Queries

Aalborg University

Unbalanced Hierarchies Example

Kim Ahlstrøm Jakobsen Optimizing RDF Data Cubes 14 / 19

Page 37: Optimizing RDF Data Cubes for Efficient Processing of Analytical Queries

Aalborg University

Unbalanced Hierarchies Example

Kim Ahlstrøm Jakobsen Optimizing RDF Data Cubes 14 / 19

Page 38: Optimizing RDF Data Cubes for Efficient Processing of Analytical Queries

Aalborg University

Unbalanced Hierarchies Example

Kim Ahlstrøm Jakobsen Optimizing RDF Data Cubes 14 / 19

Page 39: Optimizing RDF Data Cubes for Efficient Processing of Analytical Queries

Aalborg University

Query rewriting

SELECT ?name sum(? p r i c e )WHERE {? l i n e i t em : e x t e n d e d p r i c e ? p r i c e ;

: h a s o r d e r ? o r d e r .? o r d e r sko s : b roade r ? customer .? customer sko s : b roade r ? na t i o n .? na t i o n : name ?name .

}GROUP BY ?name

Kim Ahlstrøm Jakobsen Optimizing RDF Data Cubes 15 / 19

Page 40: Optimizing RDF Data Cubes for Efficient Processing of Analytical Queries

Aalborg University

Query rewriting

SELECT ?name sum(? p r i c e )WHERE {? l i n e i t em : e x t e n d e d p r i c e ? p r i c e ;

: h a s o r d e r ? o r d e r .? o r d e r sko s : b roade r ? customer .? customer sko s : b roade r ? na t i o n .? na t i o n : name ?name .

}GROUP BY ?name

SELECT ?name sum(? p r i c e )WHERE {? l i n e i t em : e x t e n d e d p r i c e ? p r i c e ;

: h a s o r d e r ? o r d e r .? o r d e r : nat ion name ?name .

}GROUP BY ?name

Kim Ahlstrøm Jakobsen Optimizing RDF Data Cubes 16 / 19

Page 41: Optimizing RDF Data Cubes for Efficient Processing of Analytical Queries

Aalborg University

Results

Virtuoso

Star Denormalized

Increase in Triples 16 % 173 %Avg. Decease in Query Time 600 % 700 %Geo. M. Decease in Query Time 110 % 140 %

Cost of triple storage

Static and frequently changing data

Kim Ahlstrøm Jakobsen Optimizing RDF Data Cubes 17 / 19

Page 42: Optimizing RDF Data Cubes for Efficient Processing of Analytical Queries

Aalborg University

Future Work

More cube optimizations

Consider data provenance andquality

Kim Ahlstrøm Jakobsen Optimizing RDF Data Cubes 18 / 19

Page 43: Optimizing RDF Data Cubes for Efficient Processing of Analytical Queries

Thank you

Page 44: Optimizing RDF Data Cubes for Efficient Processing of Analytical Queries

Aalborg University

SWOD AbstractExample

Kim Ahlstrøm Jakobsen Optimizing RDF Data Cubes 19 / 19

Page 45: Optimizing RDF Data Cubes for Efficient Processing of Analytical Queries

Aalborg University

SWOD AbstractExample

Kim Ahlstrøm Jakobsen Optimizing RDF Data Cubes 19 / 19

Page 46: Optimizing RDF Data Cubes for Efficient Processing of Analytical Queries

Aalborg University

SWOD AbstractExample

Kim Ahlstrøm Jakobsen Optimizing RDF Data Cubes 19 / 19

Page 47: Optimizing RDF Data Cubes for Efficient Processing of Analytical Queries

Aalborg University

SWOD AbstractExample

Kim Ahlstrøm Jakobsen Optimizing RDF Data Cubes 19 / 19

Page 48: Optimizing RDF Data Cubes for Efficient Processing of Analytical Queries

Aalborg University

SWOD AbstractExample

Kim Ahlstrøm Jakobsen Optimizing RDF Data Cubes 19 / 19

Page 49: Optimizing RDF Data Cubes for Efficient Processing of Analytical Queries

Aalborg University

SWOD AbstractExample

Kim Ahlstrøm Jakobsen Optimizing RDF Data Cubes 19 / 19

Page 50: Optimizing RDF Data Cubes for Efficient Processing of Analytical Queries

Aalborg University

SWOD AbstractExample

Kim Ahlstrøm Jakobsen Optimizing RDF Data Cubes 19 / 19

Page 51: Optimizing RDF Data Cubes for Efficient Processing of Analytical Queries

Aalborg University

SWOD AbstractExample

Kim Ahlstrøm Jakobsen Optimizing RDF Data Cubes 19 / 19

Page 52: Optimizing RDF Data Cubes for Efficient Processing of Analytical Queries

Aalborg University

SWOD AbstractExample

Kim Ahlstrøm Jakobsen Optimizing RDF Data Cubes 19 / 19

Page 53: Optimizing RDF Data Cubes for Efficient Processing of Analytical Queries

Aalborg University

SWOD AbstractExample

Kim Ahlstrøm Jakobsen Optimizing RDF Data Cubes 19 / 19

Page 54: Optimizing RDF Data Cubes for Efficient Processing of Analytical Queries

Aalborg University

SWOD AbstractExample

Kim Ahlstrøm Jakobsen Optimizing RDF Data Cubes 19 / 19

Page 55: Optimizing RDF Data Cubes for Efficient Processing of Analytical Queries

Aalborg University

SWOD AbstractExample

Kim Ahlstrøm Jakobsen Optimizing RDF Data Cubes 19 / 19

Page 56: Optimizing RDF Data Cubes for Efficient Processing of Analytical Queries

Aalborg University

SWOD AbstractExample

Kim Ahlstrøm Jakobsen Optimizing RDF Data Cubes 19 / 19

Page 57: Optimizing RDF Data Cubes for Efficient Processing of Analytical Queries

Aalborg University

Figure Credits

Workman – Licence: CC BY 3.0Credit: www.clipartbest.com

Cube – Licence: CC BY 3.0Credit: www.clipartbest.com

Turing machinehttp://www.felienne.com/

Stepshttp://www.cliparthut.com/

Future-workhttp://www.horsesforsources.com/

Kim Ahlstrøm Jakobsen Optimizing RDF Data Cubes 19 / 19