Big Data MDX with Mondrian and Apache Kylin - inovex · PDF fileSébastien*Jelsch...
Transcript of Big Data MDX with Mondrian and Apache Kylin - inovex · PDF fileSébastien*Jelsch...
![Page 1: Big Data MDX with Mondrian and Apache Kylin - inovex · PDF fileSébastien*Jelsch London,*741142015 Big*Data*MDX*with*Mondrian*and*Apache*Kylin](https://reader030.fdocuments.us/reader030/viewer/2022013101/5aabf8b77f8b9aa9488ca05d/html5/thumbnails/1.jpg)
Sébastien JelschLondon, 7-11-2015
Big Data MDX with Mondrian and Apache Kylin
![Page 2: Big Data MDX with Mondrian and Apache Kylin - inovex · PDF fileSébastien*Jelsch London,*741142015 Big*Data*MDX*with*Mondrian*and*Apache*Kylin](https://reader030.fdocuments.us/reader030/viewer/2022013101/5aabf8b77f8b9aa9488ca05d/html5/thumbnails/2.jpg)
1Big Data MDX with Mondrian and Apache Kylin Sébastien Jelsch
Agenda
▪ OLAP-on-Hadoop with Apache Kylin
▪ Features
▪ Apache Kylin & Mondrian
▪ Conclusion & Discussion
![Page 3: Big Data MDX with Mondrian and Apache Kylin - inovex · PDF fileSébastien*Jelsch London,*741142015 Big*Data*MDX*with*Mondrian*and*Apache*Kylin](https://reader030.fdocuments.us/reader030/viewer/2022013101/5aabf8b77f8b9aa9488ca05d/html5/thumbnails/3.jpg)
Big Data MDX with Mondrian and Apache Kylin Sébastien Jelsch
Agenda
▪ OLAP-on-Hadoop with Apache Kylin
▪ Features
▪ Apache Kylin & Mondrian
▪ Conclusion & Discussion
1
![Page 4: Big Data MDX with Mondrian and Apache Kylin - inovex · PDF fileSébastien*Jelsch London,*741142015 Big*Data*MDX*with*Mondrian*and*Apache*Kylin](https://reader030.fdocuments.us/reader030/viewer/2022013101/5aabf8b77f8b9aa9488ca05d/html5/thumbnails/4.jpg)
Big Data MDX with Mondrian and Apache Kylin Sébastien Jelsch
Big Data
Situation▪ More and more data becoming available on Hadoop▪ Limitations in existing Business Intelligence Tools
○ Limited support for Hadoop○ Data size growing exponentially○ High latency of interactive queries
▪ Challenges to adapt Hadoop for interactive analysis○ OLAP capability on Hadoop ecosystem not ready yet
2
![Page 5: Big Data MDX with Mondrian and Apache Kylin - inovex · PDF fileSébastien*Jelsch London,*741142015 Big*Data*MDX*with*Mondrian*and*Apache*Kylin](https://reader030.fdocuments.us/reader030/viewer/2022013101/5aabf8b77f8b9aa9488ca05d/html5/thumbnails/5.jpg)
Big Data MDX with Mondrian and Apache Kylin Sébastien Jelsch
OLAP and Big Data
Goals▪ Full OLAP capability and advanced functionality▪ Interactive analysis in subseconds▪ ANSI SQL or MDX for analysts and engineers▪ Seamless integration with BI Tools▪ High concurrency with thousands of end users▪ Distributed and scale out architecture for large data volume
3
![Page 6: Big Data MDX with Mondrian and Apache Kylin - inovex · PDF fileSébastien*Jelsch London,*741142015 Big*Data*MDX*with*Mondrian*and*Apache*Kylin](https://reader030.fdocuments.us/reader030/viewer/2022013101/5aabf8b77f8b9aa9488ca05d/html5/thumbnails/6.jpg)
Big Data MDX with Mondrian and Apache Kylin Sébastien Jelsch
What is Apache Kylin?
Solution:Apache Kylin
Extreme OLAP Engine for Big Data▪ Distributed Analytics Engine from eBay▪ OLAP-on-Hadoop▪ Provides SQL interface for multidimensional analysis▪ Based on Hadoop ecosystem
Open Source on: 1. October 2014Accepted into incubation: 25. November 2014Current version: 1.1 (25. October 2015)
4
![Page 7: Big Data MDX with Mondrian and Apache Kylin - inovex · PDF fileSébastien*Jelsch London,*741142015 Big*Data*MDX*with*Mondrian*and*Apache*Kylin](https://reader030.fdocuments.us/reader030/viewer/2022013101/5aabf8b77f8b9aa9488ca05d/html5/thumbnails/7.jpg)
OLAP Cube
Big Data MDX with Mondrian and Apache Kylin Sébastien Jelsch
Short introduction into OLAP
5
8 7 14
12 22 19
30 15 25Beer
Water
WineBerlin
Paris
London
20132014
2015
![Page 8: Big Data MDX with Mondrian and Apache Kylin - inovex · PDF fileSébastien*Jelsch London,*741142015 Big*Data*MDX*with*Mondrian*and*Apache*Kylin](https://reader030.fdocuments.us/reader030/viewer/2022013101/5aabf8b77f8b9aa9488ca05d/html5/thumbnails/8.jpg)
Big Data MDX with Mondrian and Apache Kylin Sébastien Jelsch
Apache Kylin: Architecture
6
3rd Party App Web App BI Tools
REST Server
Query Engine
Routing
OLAPCube(HBase)OLAPCube(HBase)Metadata
Cube Build Engine
HiveHDFS
Star Schema Data Key Value Data
Mid Latency Low Latency
SQLSQL JDBC / ODBC
![Page 9: Big Data MDX with Mondrian and Apache Kylin - inovex · PDF fileSébastien*Jelsch London,*741142015 Big*Data*MDX*with*Mondrian*and*Apache*Kylin](https://reader030.fdocuments.us/reader030/viewer/2022013101/5aabf8b77f8b9aa9488ca05d/html5/thumbnails/9.jpg)
Big Data MDX with Mondrian and Apache Kylin Sébastien Jelsch
Agenda
▪ OLAP-on-Hadoop with Apache Kylin
▪ Features
▪ Apache Kylin & Mondrian
▪ Conclusion & Discussion
7
![Page 10: Big Data MDX with Mondrian and Apache Kylin - inovex · PDF fileSébastien*Jelsch London,*741142015 Big*Data*MDX*with*Mondrian*and*Apache*Kylin](https://reader030.fdocuments.us/reader030/viewer/2022013101/5aabf8b77f8b9aa9488ca05d/html5/thumbnails/10.jpg)
Big Data MDX with Mondrian and Apache Kylin Sébastien Jelsch
Apache Kylin: Cube Designer
8
![Page 11: Big Data MDX with Mondrian and Apache Kylin - inovex · PDF fileSébastien*Jelsch London,*741142015 Big*Data*MDX*with*Mondrian*and*Apache*Kylin](https://reader030.fdocuments.us/reader030/viewer/2022013101/5aabf8b77f8b9aa9488ca05d/html5/thumbnails/11.jpg)
Big Data MDX with Mondrian and Apache Kylin Sébastien Jelsch
Apache Kylin: Cube Designer
9
![Page 12: Big Data MDX with Mondrian and Apache Kylin - inovex · PDF fileSébastien*Jelsch London,*741142015 Big*Data*MDX*with*Mondrian*and*Apache*Kylin](https://reader030.fdocuments.us/reader030/viewer/2022013101/5aabf8b77f8b9aa9488ca05d/html5/thumbnails/12.jpg)
Big Data MDX with Mondrian and Apache Kylin Sébastien Jelsch
Apache Kylin: Cube Designer
10
![Page 13: Big Data MDX with Mondrian and Apache Kylin - inovex · PDF fileSébastien*Jelsch London,*741142015 Big*Data*MDX*with*Mondrian*and*Apache*Kylin](https://reader030.fdocuments.us/reader030/viewer/2022013101/5aabf8b77f8b9aa9488ca05d/html5/thumbnails/13.jpg)
Big Data MDX with Mondrian and Apache Kylin Sébastien Jelsch
Apache Kylin: Cube Designer
11
![Page 14: Big Data MDX with Mondrian and Apache Kylin - inovex · PDF fileSébastien*Jelsch London,*741142015 Big*Data*MDX*with*Mondrian*and*Apache*Kylin](https://reader030.fdocuments.us/reader030/viewer/2022013101/5aabf8b77f8b9aa9488ca05d/html5/thumbnails/14.jpg)
Big Data MDX with Mondrian and Apache Kylin Sébastien Jelsch
Apache Kylin: Cube Designer
12
![Page 15: Big Data MDX with Mondrian and Apache Kylin - inovex · PDF fileSébastien*Jelsch London,*741142015 Big*Data*MDX*with*Mondrian*and*Apache*Kylin](https://reader030.fdocuments.us/reader030/viewer/2022013101/5aabf8b77f8b9aa9488ca05d/html5/thumbnails/15.jpg)
Big Data MDX with Mondrian and Apache Kylin Sébastien Jelsch
Apache Kylin: Monitoring
13
![Page 16: Big Data MDX with Mondrian and Apache Kylin - inovex · PDF fileSébastien*Jelsch London,*741142015 Big*Data*MDX*with*Mondrian*and*Apache*Kylin](https://reader030.fdocuments.us/reader030/viewer/2022013101/5aabf8b77f8b9aa9488ca05d/html5/thumbnails/16.jpg)
Big Data MDX with Mondrian and Apache Kylin Sébastien Jelsch
Apache Kylin: SQL Interface
14
![Page 17: Big Data MDX with Mondrian and Apache Kylin - inovex · PDF fileSébastien*Jelsch London,*741142015 Big*Data*MDX*with*Mondrian*and*Apache*Kylin](https://reader030.fdocuments.us/reader030/viewer/2022013101/5aabf8b77f8b9aa9488ca05d/html5/thumbnails/17.jpg)
Big Data MDX with Mondrian and Apache Kylin Sébastien Jelsch
Agenda
▪ OLAP-on-Hadoop with Apache Kylin
▪ Features
▪ Apache Kylin & Mondrian
▪ Conclusion & Discussion
15
![Page 18: Big Data MDX with Mondrian and Apache Kylin - inovex · PDF fileSébastien*Jelsch London,*741142015 Big*Data*MDX*with*Mondrian*and*Apache*Kylin](https://reader030.fdocuments.us/reader030/viewer/2022013101/5aabf8b77f8b9aa9488ca05d/html5/thumbnails/18.jpg)
Big Data MDX with Mondrian and Apache Kylin Sébastien Jelsch
Apache Kylin and MDX
SQL returns 2-dimensional result setFor more dimensions SQL was not designed
Wish:▪ Multidimensional result set▪ Consider hierarchies and levels in the data
16
Query Language: MDX
![Page 19: Big Data MDX with Mondrian and Apache Kylin - inovex · PDF fileSébastien*Jelsch London,*741142015 Big*Data*MDX*with*Mondrian*and*Apache*Kylin](https://reader030.fdocuments.us/reader030/viewer/2022013101/5aabf8b77f8b9aa9488ca05d/html5/thumbnails/19.jpg)
Big Data MDX with Mondrian and Apache Kylin Sébastien Jelsch
Pentahos Mondrian
Mondrian▪ OLAP Engine▪ Transforms MDX queries into SQL▪ Multidimensional representation of data▪ Integrated into Saiku / Pentahos Business Analytics Platform
▪ Expandable through SQL dialectse.g. MySQL, Postgres, Hive, Impala, ...
17
![Page 20: Big Data MDX with Mondrian and Apache Kylin - inovex · PDF fileSébastien*Jelsch London,*741142015 Big*Data*MDX*with*Mondrian*and*Apache*Kylin](https://reader030.fdocuments.us/reader030/viewer/2022013101/5aabf8b77f8b9aa9488ca05d/html5/thumbnails/20.jpg)
Big Data MDX with Mondrian and Apache Kylin Sébastien Jelsch
Apache Kylin + Mondrian: Idea
18
OLAP Client
Apache KylinHBase, Cuboids ...
MondrianMondrian Schema
MeasuresDimensionsHierarchiesLevelsAttributes
XMLMDX
JDBC
Kylin Dialect
![Page 21: Big Data MDX with Mondrian and Apache Kylin - inovex · PDF fileSébastien*Jelsch London,*741142015 Big*Data*MDX*with*Mondrian*and*Apache*Kylin](https://reader030.fdocuments.us/reader030/viewer/2022013101/5aabf8b77f8b9aa9488ca05d/html5/thumbnails/21.jpg)
Big Data MDX with Mondrian and Apache Kylin Sébastien Jelsch
Apache Kylin + Mondrian: Implementation
Work done:▪ Kylin dialect created▪ Optimized Kylins JDBC driver▪ Bugs fixed to get Mondrian working with Kylin
TBD:▪ Integrate Kylin dialect into Mondrians official code*▪ Make every MDX query executable
Successful tests**:▪ Current Saiku and Mondrian 4.4▪ Current Saiku and Mondrian 3.x (not tested very well)* Pull Request: https://github.com/pentaho/mondrian/pull/480** Github Project: https://github.com/mustangore/kylin-mondrian-interaction
19
![Page 22: Big Data MDX with Mondrian and Apache Kylin - inovex · PDF fileSébastien*Jelsch London,*741142015 Big*Data*MDX*with*Mondrian*and*Apache*Kylin](https://reader030.fdocuments.us/reader030/viewer/2022013101/5aabf8b77f8b9aa9488ca05d/html5/thumbnails/22.jpg)
Big Data MDX with Mondrian and Apache Kylin Sébastien Jelsch
Apache Kylin + Mondrian: Examples
20
![Page 23: Big Data MDX with Mondrian and Apache Kylin - inovex · PDF fileSébastien*Jelsch London,*741142015 Big*Data*MDX*with*Mondrian*and*Apache*Kylin](https://reader030.fdocuments.us/reader030/viewer/2022013101/5aabf8b77f8b9aa9488ca05d/html5/thumbnails/23.jpg)
Big Data MDX with Mondrian and Apache Kylin Sébastien Jelsch
Apache Kylin + Mondrian: Examples
21
![Page 24: Big Data MDX with Mondrian and Apache Kylin - inovex · PDF fileSébastien*Jelsch London,*741142015 Big*Data*MDX*with*Mondrian*and*Apache*Kylin](https://reader030.fdocuments.us/reader030/viewer/2022013101/5aabf8b77f8b9aa9488ca05d/html5/thumbnails/24.jpg)
Big Data MDX with Mondrian and Apache Kylin Sébastien Jelsch
Agenda
▪ OLAP-on-Hadoop with Apache Kylin
▪ Features
▪ Apache Kylin & Mondrian
▪ Conclusion & Discussion
22
![Page 25: Big Data MDX with Mondrian and Apache Kylin - inovex · PDF fileSébastien*Jelsch London,*741142015 Big*Data*MDX*with*Mondrian*and*Apache*Kylin](https://reader030.fdocuments.us/reader030/viewer/2022013101/5aabf8b77f8b9aa9488ca05d/html5/thumbnails/25.jpg)
Big Data MDX with Mondrian and Apache Kylin Sébastien Jelsch
Apache Kylin: Conclusion
▪ Extremely fast and scalable OLAP Engine▪ OLAP-on-Hadoop▪ Depends on Apache Hadoop infrastructure▪ MOLAP Cube▪ Incremental refresh of cubes▪ Integration into existing BI Tools▪ MDX queries with Mondrian possible (ongoing work)
23
![Page 26: Big Data MDX with Mondrian and Apache Kylin - inovex · PDF fileSébastien*Jelsch London,*741142015 Big*Data*MDX*with*Mondrian*and*Apache*Kylin](https://reader030.fdocuments.us/reader030/viewer/2022013101/5aabf8b77f8b9aa9488ca05d/html5/thumbnails/26.jpg)
Contact
Sébastien JelschBig Data Scientist
inovex GmbHOffice KarlsruheLudwig-Erhard-Allee 676131 Karlsruhe
Tel: +49 176 - 45786280E-Mail: [email protected]: @inovexgmbh | @Mustangore
Thank you for your attention
![Page 27: Big Data MDX with Mondrian and Apache Kylin - inovex · PDF fileSébastien*Jelsch London,*741142015 Big*Data*MDX*with*Mondrian*and*Apache*Kylin](https://reader030.fdocuments.us/reader030/viewer/2022013101/5aabf8b77f8b9aa9488ca05d/html5/thumbnails/27.jpg)
Big Data MDX with Mondrian and Apache Kylin Sébastien Jelsch
Introduction into OLAP
B1
1,1,1,0 1,1,0,1
1,1,1,1
1,0,1,1 0,1,1,1
0,1,1,0
1,0,0,0 0,1,0,0 0,0,1,0 0,0,0,1
0,0,0,0
0,0,1,10,1,0,11,0,0,11,0,1,01,1,0,0
Cube: All combinationsCuboid: One single combination
Number cuboids growing exponentially
0-Cuboid
N-Cuboid
![Page 28: Big Data MDX with Mondrian and Apache Kylin - inovex · PDF fileSébastien*Jelsch London,*741142015 Big*Data*MDX*with*Mondrian*and*Apache*Kylin](https://reader030.fdocuments.us/reader030/viewer/2022013101/5aabf8b77f8b9aa9488ca05d/html5/thumbnails/28.jpg)
Big Data MDX with Mondrian and Apache Kylin Sébastien Jelsch
Apache Kylin: Aggregation Groups
Problem: Number of Cuboids grows exponentiallyExample:Cube with 30 dimensionsNumber of Cuboids: 2³º > 1 billion
Solution: Partial CubeClassificate the OLAP Cube in Aggregation GroupsExample:30 dimensions splitted into 3 groups of 10 dimensionsNumber of Cuboids: 2¹º + 2¹º + 2¹º = 3072 << 1 billion
B2
![Page 29: Big Data MDX with Mondrian and Apache Kylin - inovex · PDF fileSébastien*Jelsch London,*741142015 Big*Data*MDX*with*Mondrian*and*Apache*Kylin](https://reader030.fdocuments.us/reader030/viewer/2022013101/5aabf8b77f8b9aa9488ca05d/html5/thumbnails/29.jpg)
Big Data MDX with Mondrian and Apache Kylin Sébastien Jelsch
Apache Kylin: Cube Build Process
B3
SourceHive Tables
HiveQL
DimensionDictionaries
Intermediate Hive Table
HiveQL MapReduce
HDFSSequenceFiles
N-Cuboid
(1)
(2) (3)
![Page 30: Big Data MDX with Mondrian and Apache Kylin - inovex · PDF fileSébastien*Jelsch London,*741142015 Big*Data*MDX*with*Mondrian*and*Apache*Kylin](https://reader030.fdocuments.us/reader030/viewer/2022013101/5aabf8b77f8b9aa9488ca05d/html5/thumbnails/30.jpg)
Big Data MDX with Mondrian and Apache Kylin Sébastien Jelsch
Apache Kylin: Cube Build Process
B4
MapReduce
N-Cuboid
HDFSSequenceFiles
N-1-Cuboid
HDFSSequenceFiles
0-Cuboid
HDFSSequenceFiles
... MapReduce
MapReduce
HFiles
HBase
Bulk Import