Effectively Interpreting Keyword Queries on RDF Databases with a ...
Optimizing RDF Data Cubes for Efficient Processing of Analytical Queries
-
Upload
kim-ahlstrom-jakobsen -
Category
Data & Analytics
-
view
165 -
download
0
Transcript of Optimizing RDF Data Cubes for Efficient Processing of Analytical Queries
Aalborg University
Optimizing RDF Data Cubes for Efficient Processing ofAnalytical Queries
Kim Ahlstrøm JakobsenAlex B. Andersen
Katja HoseTorben Bach Pedersen
Database Technology,Department of Computer Science,
Aalborg University
Kim Ahlstrøm Jakobsen Optimizing RDF Data Cubes 1 / 19
Aalborg University
Motivation
Kim Ahlstrøm Jakobsen Optimizing RDF Data Cubes 2 / 19
Aalborg University
Motivation
Kim Ahlstrøm Jakobsen Optimizing RDF Data Cubes 2 / 19
Aalborg University
Motivation
Kim Ahlstrøm Jakobsen Optimizing RDF Data Cubes 2 / 19
Aalborg University
Motivation
Kim Ahlstrøm Jakobsen Optimizing RDF Data Cubes 2 / 19
Aalborg University
Motivation
Kim Ahlstrøm Jakobsen Optimizing RDF Data Cubes 2 / 19
Aalborg University
Motivation
Kim Ahlstrøm Jakobsen Optimizing RDF Data Cubes 2 / 19
Aalborg University
Future Goal
Goal
Analytical queries on internal data & external linked data
Benefits
Enables exploratory queries
Increasing amount of linked data
Integrates with heterogeneous data
Semantic reasoning
Kim Ahlstrøm Jakobsen Optimizing RDF Data Cubes 3 / 19
Aalborg University
Future Goal
Goal
Analytical queries on internal data & external linked data
Benefits
Enables exploratory queries
Increasing amount of linked data
Integrates with heterogeneous data
Semantic reasoning
Kim Ahlstrøm Jakobsen Optimizing RDF Data Cubes 3 / 19
Aalborg University
The First Steps
Efficient Processing of Analytical Querying on RDF Data Cubes.
Denormalize the cube dimensions
Reduce the subject-object joins (expensive)
Increase the subject-subject joins
Kim Ahlstrøm Jakobsen Optimizing RDF Data Cubes 4 / 19
Aalborg University
The First Steps
Efficient Processing of Analytical Querying on RDF Data Cubes.
Denormalize the cube dimensions
Reduce the subject-object joins (expensive)
Increase the subject-subject joins
Kim Ahlstrøm Jakobsen Optimizing RDF Data Cubes 4 / 19
Aalborg University
Workflow
Internal optimization
Kim Ahlstrøm Jakobsen Optimizing RDF Data Cubes 5 / 19
Aalborg University
Workflow
Internal optimization
Kim Ahlstrøm Jakobsen Optimizing RDF Data Cubes 5 / 19
Aalborg University
Workflow
Internal optimization
Kim Ahlstrøm Jakobsen Optimizing RDF Data Cubes 5 / 19
Aalborg University
Workflow
Internal optimization
Kim Ahlstrøm Jakobsen Optimizing RDF Data Cubes 5 / 19
Aalborg University
Workflow
Internal optimization
Kim Ahlstrøm Jakobsen Optimizing RDF Data Cubes 5 / 19
Aalborg University
Workflow
Internal optimization
Kim Ahlstrøm Jakobsen Optimizing RDF Data Cubes 5 / 19
Aalborg University
Workflow
Internal optimization
Kim Ahlstrøm Jakobsen Optimizing RDF Data Cubes 5 / 19
Aalborg University
Workflow
Internal optimization
Kim Ahlstrøm Jakobsen Optimizing RDF Data Cubes 5 / 19
Aalborg University
Workflow
Internal optimization
Kim Ahlstrøm Jakobsen Optimizing RDF Data Cubes 5 / 19
Aalborg University
Building the Cube
Purpose
Organize data with purpose ofanalysis
Easier to understand
What is a cube
Facts: The subject of the analysis
Dimensions: Perspectives of the data
Levels: Concepts in the dimensions
Kim Ahlstrøm Jakobsen Optimizing RDF Data Cubes 6 / 19
Aalborg University
Building the Cube
Purpose
Organize data with purpose ofanalysis
Easier to understand
What is a cube
Facts: The subject of the analysis
Dimensions: Perspectives of the data
Levels: Concepts in the dimensions
Kim Ahlstrøm Jakobsen Optimizing RDF Data Cubes 6 / 19
Aalborg University
Analytical Queries
Example Query 1
What is the revenue per country?
Example Query 2
What are the top k products bought by customers from Denmark?
Kim Ahlstrøm Jakobsen Optimizing RDF Data Cubes 7 / 19
Aalborg University
Analytical Queries
Example Query 1
What is the revenue per country?
Example Query 2
What are the top k products bought by customers from Denmark?
Kim Ahlstrøm Jakobsen Optimizing RDF Data Cubes 7 / 19
Aalborg University
PatternsSnowflake Pattern
Kim Ahlstrøm Jakobsen Optimizing RDF Data Cubes 8 / 19
Aalborg University
PatternsStar Pattern
Kim Ahlstrøm Jakobsen Optimizing RDF Data Cubes 9 / 19
Aalborg University
PatternsFully Denormalized Pattern
Kim Ahlstrøm Jakobsen Optimizing RDF Data Cubes 10 / 19
Aalborg University
Special Cases:Unbalanced Hierarchies
Kim Ahlstrøm Jakobsen Optimizing RDF Data Cubes 11 / 19
Aalborg University
Special Cases:Property Collision
Kim Ahlstrøm Jakobsen Optimizing RDF Data Cubes 12 / 19
Aalborg University
Special Cases:Property Collision
Kim Ahlstrøm Jakobsen Optimizing RDF Data Cubes 12 / 19
Aalborg University
Special Cases:Property Collision
Kim Ahlstrøm Jakobsen Optimizing RDF Data Cubes 12 / 19
Aalborg University
Semantic Web OLAP Denormalization Algorithm
Input
QB4OLAP ontology
Snowflake pattern RDF datacube
Output
Star pattern RDF data cube
Fully Denormalized pattern RDFdata cube
Features
Top-down traversal
Property renaming
Kim Ahlstrøm Jakobsen Optimizing RDF Data Cubes 13 / 19
Aalborg University
Unbalanced Hierarchies Example
Kim Ahlstrøm Jakobsen Optimizing RDF Data Cubes 14 / 19
Aalborg University
Unbalanced Hierarchies Example
Kim Ahlstrøm Jakobsen Optimizing RDF Data Cubes 14 / 19
Aalborg University
Unbalanced Hierarchies Example
Kim Ahlstrøm Jakobsen Optimizing RDF Data Cubes 14 / 19
Aalborg University
Unbalanced Hierarchies Example
Kim Ahlstrøm Jakobsen Optimizing RDF Data Cubes 14 / 19
Aalborg University
Unbalanced Hierarchies Example
Kim Ahlstrøm Jakobsen Optimizing RDF Data Cubes 14 / 19
Aalborg University
Unbalanced Hierarchies Example
Kim Ahlstrøm Jakobsen Optimizing RDF Data Cubes 14 / 19
Aalborg University
Query rewriting
SELECT ?name sum(? p r i c e )WHERE {? l i n e i t em : e x t e n d e d p r i c e ? p r i c e ;
: h a s o r d e r ? o r d e r .? o r d e r sko s : b roade r ? customer .? customer sko s : b roade r ? na t i o n .? na t i o n : name ?name .
}GROUP BY ?name
Kim Ahlstrøm Jakobsen Optimizing RDF Data Cubes 15 / 19
Aalborg University
Query rewriting
SELECT ?name sum(? p r i c e )WHERE {? l i n e i t em : e x t e n d e d p r i c e ? p r i c e ;
: h a s o r d e r ? o r d e r .? o r d e r sko s : b roade r ? customer .? customer sko s : b roade r ? na t i o n .? na t i o n : name ?name .
}GROUP BY ?name
SELECT ?name sum(? p r i c e )WHERE {? l i n e i t em : e x t e n d e d p r i c e ? p r i c e ;
: h a s o r d e r ? o r d e r .? o r d e r : nat ion name ?name .
}GROUP BY ?name
Kim Ahlstrøm Jakobsen Optimizing RDF Data Cubes 16 / 19
Aalborg University
Results
Virtuoso
Star Denormalized
Increase in Triples 16 % 173 %Avg. Decease in Query Time 600 % 700 %Geo. M. Decease in Query Time 110 % 140 %
Cost of triple storage
Static and frequently changing data
Kim Ahlstrøm Jakobsen Optimizing RDF Data Cubes 17 / 19
Aalborg University
Future Work
More cube optimizations
Consider data provenance andquality
Kim Ahlstrøm Jakobsen Optimizing RDF Data Cubes 18 / 19
Thank you
Aalborg University
SWOD AbstractExample
Kim Ahlstrøm Jakobsen Optimizing RDF Data Cubes 19 / 19
Aalborg University
SWOD AbstractExample
Kim Ahlstrøm Jakobsen Optimizing RDF Data Cubes 19 / 19
Aalborg University
SWOD AbstractExample
Kim Ahlstrøm Jakobsen Optimizing RDF Data Cubes 19 / 19
Aalborg University
SWOD AbstractExample
Kim Ahlstrøm Jakobsen Optimizing RDF Data Cubes 19 / 19
Aalborg University
SWOD AbstractExample
Kim Ahlstrøm Jakobsen Optimizing RDF Data Cubes 19 / 19
Aalborg University
SWOD AbstractExample
Kim Ahlstrøm Jakobsen Optimizing RDF Data Cubes 19 / 19
Aalborg University
SWOD AbstractExample
Kim Ahlstrøm Jakobsen Optimizing RDF Data Cubes 19 / 19
Aalborg University
SWOD AbstractExample
Kim Ahlstrøm Jakobsen Optimizing RDF Data Cubes 19 / 19
Aalborg University
SWOD AbstractExample
Kim Ahlstrøm Jakobsen Optimizing RDF Data Cubes 19 / 19
Aalborg University
SWOD AbstractExample
Kim Ahlstrøm Jakobsen Optimizing RDF Data Cubes 19 / 19
Aalborg University
SWOD AbstractExample
Kim Ahlstrøm Jakobsen Optimizing RDF Data Cubes 19 / 19
Aalborg University
SWOD AbstractExample
Kim Ahlstrøm Jakobsen Optimizing RDF Data Cubes 19 / 19
Aalborg University
SWOD AbstractExample
Kim Ahlstrøm Jakobsen Optimizing RDF Data Cubes 19 / 19
Aalborg University
Figure Credits
Workman – Licence: CC BY 3.0Credit: www.clipartbest.com
Cube – Licence: CC BY 3.0Credit: www.clipartbest.com
Turing machinehttp://www.felienne.com/
Stepshttp://www.cliparthut.com/
Future-workhttp://www.horsesforsources.com/
Kim Ahlstrøm Jakobsen Optimizing RDF Data Cubes 19 / 19