Trivadis TechEvent 2017 Querying distributed data with SQL and Apache Drill by Jonatan Kazmierczak
-
Upload
trivadis -
Category
Technology
-
view
78 -
download
3
Transcript of Trivadis TechEvent 2017 Querying distributed data with SQL and Apache Drill by Jonatan Kazmierczak
BASEL BERN BRUGG DÜSSELDORF FRANKFURT A.M. FREIBURG I.BR. GENF HAMBURG KOPENHAGEN LAUSANNE MÜNCHEN STUTTGART WIEN ZÜRICH
Querying distributed data Querying distributed data (CSV, JSON, MongoDB) (CSV, JSON, MongoDB) with SQL and Apache Drillwith SQL and Apache Drill
Jonatan KazmierczakJonatan Kazmierczak
-- SQL and Drill -- Jonatan-Kazmierczak -- TechEvent 2017 --
Why is it important?
Apache Drill allows to query, explore, transform and expose distributed data stored in various formatsusing SQL standardwithout involving expensive and complex tools, processes and infrastructure.
-- SQL and Drill -- Jonatan-Kazmierczak -- TechEvent 2017 --
About author
senior consultant at Trivadis creator of Class Visualizer working with Java+SQL for 20 years top rated participant in contests
in programming and data science:HackerRank, TopCoder, Code Jam
conference speaker fan of Atari XL/XE demos
-- SQL and Drill -- Jonatan-Kazmierczak -- TechEvent 2017 --
About author – cont.www.hackerrank.com/jonatan_k
1st rank in Java 1st rank in JavaScript Top 1% in functional programming in Scala Top 1% in SQL Medalist in algorithmic contests
-- SQL and Drill -- Jonatan-Kazmierczak -- TechEvent 2017 --
Agenda
Introduction Demo: starting with Drill Technical details Demo: deep dive into Drill Summary Q & A
-- SQL and Drill -- Jonatan-Kazmierczak -- TechEvent 2017 --
Computers – beforewww.amibay.com/showthread.php?71410-Atari-65XE-BOX-XC12-BOX-2-Quickshots
-- SQL and Drill -- Jonatan-Kazmierczak -- TechEvent 2017 --
What is Apache Drill ?
low latency distributed schema-free SQL query engine for large-scale datasets
designed to scale to several thousands of nodes and query petabytes of data at the speeds required by BI/Analytics environments
-- SQL and Drill -- Jonatan-Kazmierczak -- TechEvent 2017 --
Basic info
Website drill.apache.org
Current version 1.11.0
Query language SQL:2003
Interfaces shell, web console, JDBC/ODBC, REST API, Java API, C++ API
-- SQL and Drill -- Jonatan-Kazmierczak -- TechEvent 2017 --
Supported data sources and formats
RDBMS FSNoSQL
-- SQL and Drill -- Jonatan-Kazmierczak -- TechEvent 2017 --
Features
Dynamic schema discovery Flexible data model In-memory data processing (whenever possible) Extensible architecture Distributed and embedded mode
-- SQL and Drill -- Jonatan-Kazmierczak -- TechEvent 2017 --
Distributed setup
Node 1
Node 2
Node 3
ZooKeeper Drillbit
Drillbit
Drillbit
Client
-- SQL and Drill -- Jonatan-Kazmierczak -- TechEvent 2017 --
Sample query
select * from dfs.demo.`countries.csv`
storage plugin
workspace
table / view / file / document
-- SQL and Drill -- Jonatan-Kazmierczak -- TechEvent 2017 --
Query execution
Drillbit
Client
Client
Foreman
Foreman
SQL Parser
SQL Parser
Optimizer
Optimizer
Parallelizer
Parallelizer
Executor
Executor
Storage Plugin
Storage Plugin
SQL query
parse SQL query
logical plan
optimize logical plan
physical plan
parallelize physical plan
execution tree (fragments)
execute fragments
fetch data
data
results
results
-- SQL and Drill -- Jonatan-Kazmierczak -- TechEvent 2017 --
Advantages
Easy to start working with
Concept of SQL-on-Anything
Using standard SQL
-- SQL and Drill -- Jonatan-Kazmierczak -- TechEvent 2017 --
Disadvantages
Partially implemented or unfinished features
Lacks in documentation
-- SQL and Drill -- Jonatan-Kazmierczak -- TechEvent 2017 --
Use cases
Data exploration
Data transformation
BI / Data analytics
-- SQL and Drill -- Jonatan-Kazmierczak -- TechEvent 2017 --
Session Feedback – now
Please use the Trivadis Events mobile app to give feedback on each session
Use "My schedule" if you have registered for a session;
Otherwise use "Agenda" and the search function
If the mobile app does not work, use the web browser URL: http://trivadis.quickmobileplatform.eu/ User name: <your_loginname> (such as "svv") Password: sent by e-mail...