Apache Big Data EU 2015 - Phoenix
-
Upload
nick-dimiduk -
Category
Technology
-
view
2.858 -
download
1
Transcript of Apache Big Data EU 2015 - Phoenix
![Page 1: Apache Big Data EU 2015 - Phoenix](https://reader034.fdocuments.us/reader034/viewer/2022042907/5882153f1a28ab3f4c8b5549/html5/thumbnails/1.jpg)
V5
The Evolu*on of a Rela*onal Database Layer over HBase
@ApachePhoenix h0p://phoenix.apache.org
2015-‐09-‐29
Nick Dimiduk (@xefyr) h0p://n10k.com
#apachebigdata
![Page 2: Apache Big Data EU 2015 - Phoenix](https://reader034.fdocuments.us/reader034/viewer/2022042907/5882153f1a28ab3f4c8b5549/html5/thumbnails/2.jpg)
Agenda o What is Apache Phoenix? o State of the Union o Latest Releases o A brief aside on TransacQons o What’s Next? o Q & A
![Page 3: Apache Big Data EU 2015 - Phoenix](https://reader034.fdocuments.us/reader034/viewer/2022042907/5882153f1a28ab3f4c8b5549/html5/thumbnails/3.jpg)
WHAT IS APACHE PHOENIX PuUng the SQL back into NoSQL
![Page 4: Apache Big Data EU 2015 - Phoenix](https://reader034.fdocuments.us/reader034/viewer/2022042907/5882153f1a28ab3f4c8b5549/html5/thumbnails/4.jpg)
What is Apache Phoenix?
o A relaQonal database layer for Apache HBase o Query engine
o Transforms SQL queries into naQve HBase API calls o Pushes as much work as possible onto the cluster for parallel execuQon
o Metadata repository o Typed access to data stored in HBase tables
o A JDBC driver o A top level Apache So_ware FoundaQon project
o Originally developed at Salesforce o A growing community with momentum
![Page 5: Apache Big Data EU 2015 - Phoenix](https://reader034.fdocuments.us/reader034/viewer/2022042907/5882153f1a28ab3f4c8b5549/html5/thumbnails/5.jpg)
What is Apache Phoenix?
==
==
![Page 6: Apache Big Data EU 2015 - Phoenix](https://reader034.fdocuments.us/reader034/viewer/2022042907/5882153f1a28ab3f4c8b5549/html5/thumbnails/6.jpg)
Where Does Phoenix Fit In? Sqoo
p
RDB Da
ta Collector
Flum
e
Log Da
ta Collector
Zookeepe
r Co
ordinaQo
n
YARN / MRv2
Cluster Resource Manager / MapReduce
HDFS
Hadoop Distributed File System
GraphX Graph analysis framework
Phoenix Query execuQon engine
HBase Distributed Database
The Java Virtual Machine Hadoop
Common JNI
Spark
IteraQve In-‐Memory ComputaQon
MLLib Data mining
Pig Data ManipulaQon
Hive Structured Query
Phoenix JDBC client
![Page 7: Apache Big Data EU 2015 - Phoenix](https://reader034.fdocuments.us/reader034/viewer/2022042907/5882153f1a28ab3f4c8b5549/html5/thumbnails/7.jpg)
Phoenix Puts the SQL Back in NoSQL
o Familiar SQL interface for (DDL, DML, etc.) o Read only views on exisQng HBase data o Dynamic columns extend schema at runQme o Flashback queries using versioned schema
o Data types o Secondary indexes o Joins o Query planning and opQmizaQon o MulQ-‐tenancy
![Page 8: Apache Big Data EU 2015 - Phoenix](https://reader034.fdocuments.us/reader034/viewer/2022042907/5882153f1a28ab3f4c8b5549/html5/thumbnails/8.jpg)
Phoenix Puts the SQL Back in NoSQL
Accessing HBase data with Phoenix can be substanQally easier than direct HBase API use
Versus HTable t = new HTable(“foo”); RegionScanner s = t.getScanner(new Scan(…,
new ValueFilter(CompareOp.GT, new CustomTypedComparator(30)), …));
while ((Result r = s.next()) != null) { // blah blah blah Java Java Java } s.close(); t.close();
SELECT * FROM foo WHERE bar > 30
HTable t = new HTable(“foo”); RegionScanner s = t.getScanner(new Scan(…,
new ValueFilter(CompareOp.GT, new CustomTypedComparator(30)), …));
while ((Result r = s.next()) != null) { // blah blah blah Java Java Java } s.close(); t.close();
Your BI tool probably can’t do this
(And we didn’t include error handling…)
![Page 9: Apache Big Data EU 2015 - Phoenix](https://reader034.fdocuments.us/reader034/viewer/2022042907/5882153f1a28ab3f4c8b5549/html5/thumbnails/9.jpg)
STATE OF THE UNION So how’s things?
![Page 10: Apache Big Data EU 2015 - Phoenix](https://reader034.fdocuments.us/reader034/viewer/2022042907/5882153f1a28ab3f4c8b5549/html5/thumbnails/10.jpg)
And more ….
![Page 11: Apache Big Data EU 2015 - Phoenix](https://reader034.fdocuments.us/reader034/viewer/2022042907/5882153f1a28ab3f4c8b5549/html5/thumbnails/11.jpg)
State of the Union o SQL Support: Broad enough to run TPC queries
o Joins, Sub-‐queries, Derived tables, etc. o Secondary Indices: Three different strategies
o Immutable for write-‐once/append only data o Global for read-‐heavy mutable data o Local for write-‐heavy mutable or immutable data
o StaQsQcs driven parallel execuQon o Monitoring & Management
o Tracing, metrics
![Page 12: Apache Big Data EU 2015 - Phoenix](https://reader034.fdocuments.us/reader034/viewer/2022042907/5882153f1a28ab3f4c8b5549/html5/thumbnails/12.jpg)
Join and Subquery Support o Grammar
o inner join; le_/right/full outer join; cross join o AddiQonal
o semi join; anQ join o Algorithms
o hash-‐join; sort-‐merge-‐join o OpQmizaQons
o Predicate push-‐down; FK-‐to-‐PK join opQmizaQon; Global index with missing data columns; Correlated subquery rewrite
![Page 13: Apache Big Data EU 2015 - Phoenix](https://reader034.fdocuments.us/reader034/viewer/2022042907/5882153f1a28ab3f4c8b5549/html5/thumbnails/13.jpg)
TPC Example 1 Small-‐QuanQty-‐Order Revenue Query (Q17)
select sum(l_extendedprice) / 7.0 as avg_yearly from lineitem, part where p_partkey = l_partkey and p_brand = '[B]' and p_container = '[C]' and l_quantity < ( select 0.2 * avg(l_quantity) from lineitem where l_partkey = p_partkey );
CLIENT 4-WAY FULL SCAN OVER lineitem PARALLEL INNER JOIN TABLE 0 CLIENT 1-WAY FULL SCAN OVER part SERVER FILTER BY p_partkey = ‘[B]’ AND p_container = ‘[C]’ PARALLEL INNER JOIN TABLE 1 CLIENT 4-WAY FULL SCAN OVER lineitem SERVER AGGREGATE INTO DISTINCT ROWS BY l_partkey AFTER-JOIN SERVER FILTER BY l_quantity < $0
![Page 14: Apache Big Data EU 2015 - Phoenix](https://reader034.fdocuments.us/reader034/viewer/2022042907/5882153f1a28ab3f4c8b5549/html5/thumbnails/14.jpg)
TPC Example 2 Order Priority Checking Query (Q4)
select o_orderpriority, count(*) as order_count from orders where o_orderdate >= date '[D]' and o_orderdate < date '[D]' + interval '3' month and exists ( select * from lineitem where l_orderkey = o_orderkey and l_commitdate < l_receiptdate ) group by o_orderpriority order by o_orderpriority;
CLIENT 4-WAY FULL SCAN OVER orders SERVER FILTER o_orderdate >= ‘[D]’ AND o_orderdate < ‘[D]’ + 3(d) SERVER AGGREGATE INTO ORDERED DISTINCT ROWS BY o_orderpriority CLIENT MERGE SORT SKIP-SCAN JOIN TABLE 0 CLIENT 4-WAY FULL SCAN OVER lineitem SERVER FILTER BY l_commitdate < l_receiptdate DYNAMIC SERVER FILTER BY o_orderkey IN l_orderkey
![Page 15: Apache Big Data EU 2015 - Phoenix](https://reader034.fdocuments.us/reader034/viewer/2022042907/5882153f1a28ab3f4c8b5549/html5/thumbnails/15.jpg)
Join support -‐ what can't we do?
o Nested Loop Join o StaQsQcs Guided Join Algorithm
o Smartly choose the smaller table for the build side o Smartly switch between hash-‐join and sort-‐merge-‐join o Smartly turn on/off FK-‐to-‐PK join opQmizaQon
![Page 16: Apache Big Data EU 2015 - Phoenix](https://reader034.fdocuments.us/reader034/viewer/2022042907/5882153f1a28ab3f4c8b5549/html5/thumbnails/16.jpg)
LATEST RELEASES The new goodness, from our community to your datacenter!
![Page 17: Apache Big Data EU 2015 - Phoenix](https://reader034.fdocuments.us/reader034/viewer/2022042907/5882153f1a28ab3f4c8b5549/html5/thumbnails/17.jpg)
Phoenix 4.4 o HBase 1.0 Support o Func*onal Indexes o User Defined Func*ons o Query Server with Thin Driver o Union All support o TesQng at scale with Pherf o MR index build o Spark integraQon o Date built-‐in funcQons – WEEK, DAYOFMONTH, etc.
![Page 18: Apache Big Data EU 2015 - Phoenix](https://reader034.fdocuments.us/reader034/viewer/2022042907/5882153f1a28ab3f4c8b5549/html5/thumbnails/18.jpg)
Phoenix 4.4: FuncQonal Indexes
o CreaQng an index on an expression as opposed to just a column value.
o Example without index, full table scan: SELECT AVG(response_*me) FROM SERVER_METRICS WHERE DAYOFMONTH(create_*me) = 1
o Add funcQonal index, range scan: CREATE INDEX day_of_month_idx ON SERVER_METRICS (DAYOFMONTH(create_*me)) INCLUDE (response_*me)
![Page 19: Apache Big Data EU 2015 - Phoenix](https://reader034.fdocuments.us/reader034/viewer/2022042907/5882153f1a28ab3f4c8b5549/html5/thumbnails/19.jpg)
Phoenix 4.4: User Defined FuncQons o Extension points to Phoenix for domain-‐specific funcQons.
o For example, a geo-‐locaQon applicaQon might load a set of UDFs like this: CREATE FUNCTION WOEID_DISTANCE(INTEGER,INTEGER) RETURNS INTEGER AS ‘org.apache.geo.woeidDistance’ USING JAR ‘/lib/geo/geoloc.jar’
o Querying, funcQonal indexing, etc. then possible: SELECT * FROM woeid a JOIN woeid b ON a.country = b.country WHERE woeid_distance(a.ID,b.ID) < 5
![Page 20: Apache Big Data EU 2015 - Phoenix](https://reader034.fdocuments.us/reader034/viewer/2022042907/5882153f1a28ab3f4c8b5549/html5/thumbnails/20.jpg)
Phoenix 4.4: Query Server + Thin Driver o Offloads query planning and execuQon to different server(s)
o Minimizes client dependencies o Enabler for ODBC driver (work in progress)
o Connect like this instead: Connection conn = DriverManager.getConnection(
"jdbc:phoenix:thin:url=http://localhost:8765");
o SQll evolving: no backward compa*bility guarantees yet
h0p://phoenix.apache.org/server.html
![Page 21: Apache Big Data EU 2015 - Phoenix](https://reader034.fdocuments.us/reader034/viewer/2022042907/5882153f1a28ab3f4c8b5549/html5/thumbnails/21.jpg)
Phoenix 4.4 o HBase 1.0 Support o FuncQonal Indexes o User Defined FuncQons o Query Server with Thin Driver o Union All support o Tes*ng at scale with Pherf o MR index build o Spark integra*on o Date built-‐in func*ons – WEEK, DAYOFMONTH, &c.
![Page 22: Apache Big Data EU 2015 - Phoenix](https://reader034.fdocuments.us/reader034/viewer/2022042907/5882153f1a28ab3f4c8b5549/html5/thumbnails/22.jpg)
Phoenix 4.5 o HBase 1.1 Support o Client-‐side per-‐statement metrics o SELECT without FROM o Now possible to alter tables with views o Math built-‐in funcQons – SIGN, SQRT, CBRT, EXP, &c. o Array built-‐in funcQons – ARRAY_CAT, ARRAY_FILL, &c.
o UDF, View, FuncQonal Index enhancements o Pherf and Performance enhancements
![Page 23: Apache Big Data EU 2015 - Phoenix](https://reader034.fdocuments.us/reader034/viewer/2022042907/5882153f1a28ab3f4c8b5549/html5/thumbnails/23.jpg)
TRANSACTIONS I was told there would be…
![Page 24: Apache Big Data EU 2015 - Phoenix](https://reader034.fdocuments.us/reader034/viewer/2022042907/5882153f1a28ab3f4c8b5549/html5/thumbnails/24.jpg)
TransacQons (Work in Progress) o Snapshot isolaQon model
o Using Tephra (h0p://tephra.io/) o Supports REPEATABLE_READ isolaQon level o Allows reading your own uncommi0ed data
o OpQonal o Enabled on a table by table basis o No performance penalty when not used
o Work in progress, but close to release o Try our txn branch o Will be available in next release
![Page 25: Apache Big Data EU 2015 - Phoenix](https://reader034.fdocuments.us/reader034/viewer/2022042907/5882153f1a28ab3f4c8b5549/html5/thumbnails/25.jpg)
OpQmisQc Concurrency Control
• Avoids cost of locking rows and tables • No deadlocks or lock escalations • Cost of conflict detection and possible rollback is higher • Good if conflicts are rare: short transaction, disjoint
partitioning of work • Conflict detection not always necessary: write-once/
append-only data
![Page 26: Apache Big Data EU 2015 - Phoenix](https://reader034.fdocuments.us/reader034/viewer/2022042907/5882153f1a28ab3f4c8b5549/html5/thumbnails/26.jpg)
Tephra Architecture
ZooKeeper
Tx Manager (standby)
HBase
Master 1
Master 2
RS 1
RS 2 RS 4
RS 3
Client 1
Client 2
Client N
Tx Manager (active)
![Page 27: Apache Big Data EU 2015 - Phoenix](https://reader034.fdocuments.us/reader034/viewer/2022042907/5882153f1a28ab3f4c8b5549/html5/thumbnails/27.jpg)
time out try abort
failed roll back in HBase
write to
HBase do work
Client Tx Manager
none
complete V abort succeeded
in progress
start tx start
start tx
commit try commit check conflicts
invalid X invalidate
failed
TransacQon Lifecycle
![Page 28: Apache Big Data EU 2015 - Phoenix](https://reader034.fdocuments.us/reader034/viewer/2022042907/5882153f1a28ab3f4c8b5549/html5/thumbnails/28.jpg)
Tephra Architecture
• TransactionAware client!• Coordinates transaction lifecycle with manager • Communicates directly with HBase for reads and writes
• Transaction Manager • Assigns transaction IDs • Maintains state on in-progress, committed and invalid transactions!
• Transaction Processor coprocessor • Applies server-side filtering for reads • Cleans up data from failed transactions, and no longer visible versions
![Page 29: Apache Big Data EU 2015 - Phoenix](https://reader034.fdocuments.us/reader034/viewer/2022042907/5882153f1a28ab3f4c8b5549/html5/thumbnails/29.jpg)
WHAT’S NEXT? A brave new world
![Page 30: Apache Big Data EU 2015 - Phoenix](https://reader034.fdocuments.us/reader034/viewer/2022042907/5882153f1a28ab3f4c8b5549/html5/thumbnails/30.jpg)
What’s Next?
o Is Phoenix done? o What about the Big Picture?
o How can Phoenix be leveraged in the larger ecosystem?
o Hive, Pig, MR integraQon with Phoenix today, but it’s not a great story
![Page 31: Apache Big Data EU 2015 - Phoenix](https://reader034.fdocuments.us/reader034/viewer/2022042907/5882153f1a28ab3f4c8b5549/html5/thumbnails/31.jpg)
What’s Next? You are here
![Page 32: Apache Big Data EU 2015 - Phoenix](https://reader034.fdocuments.us/reader034/viewer/2022042907/5882153f1a28ab3f4c8b5549/html5/thumbnails/32.jpg)
Moving to Apache Calcite o Query parser, compiler, and planner framework
o SQL-‐92 compliant (ever argue SQL with Julian? :-‐) ) o Enables Phoenix to get missing SQL support
o Pluggable cost-‐based opQmizer framework o Sane way to model push down through rules
o Interop with other Calcite adaptors o Not for free, but it becomes feasible o Already used by Drill, Hive, Kylin, Samza o Supports any JDBC source (i.e. RDBMS -‐ remember them :-‐) )
o One cost-‐model to rule them all
![Page 33: Apache Big Data EU 2015 - Phoenix](https://reader034.fdocuments.us/reader034/viewer/2022042907/5882153f1a28ab3f4c8b5549/html5/thumbnails/33.jpg)
How does Phoenix plug in?
Calcite Parser & Validator
Calcite Query OpQmizer
Phoenix Query Plan Generator
Phoenix RunQme
Phoenix Tables over HBase
JDBC Client
SQL + Phoenix specific grammar
Built-‐in rules + Phoenix
specific rules
![Page 34: Apache Big Data EU 2015 - Phoenix](https://reader034.fdocuments.us/reader034/viewer/2022042907/5882153f1a28ab3f4c8b5549/html5/thumbnails/34.jpg)
OpQmizaQon Rules
o AggregateRemoveRule o FilterAggregateTransposeRule o FilterJoinRule o FilterMergeRule o JoinCommuteRule o PhoenixFilterScanMergeRule o PhoenixJoinSingleAggregateMergeRule o …
![Page 35: Apache Big Data EU 2015 - Phoenix](https://reader034.fdocuments.us/reader034/viewer/2022042907/5882153f1a28ab3f4c8b5549/html5/thumbnails/35.jpg)
Query Example (filter push-‐down and smart join algorithm)
LogicalFilter filter: $0 = ‘x’
LogicalJoin type: inner
cond: $3 = $7
LogicalProject projects: $0, $5
LogicalTableScan table: A
LogicalTableScan table: B
PhoenixTableScan table: ‘a’
filter: $0 = ‘x’
PhoenixServerJoin type: inner
cond: $3 = $1
PhoenixServerProject projects: $2, $0
OpQmizer (with
RelOptRules & ConvertRules)
PhoenixTableScan table: ‘b’
PhoenixServerProject projects: $0, $2
PhoenixServerProject projects: $0, $3
![Page 36: Apache Big Data EU 2015 - Phoenix](https://reader034.fdocuments.us/reader034/viewer/2022042907/5882153f1a28ab3f4c8b5549/html5/thumbnails/36.jpg)
Query Example (filter push-‐down and smart join algorithm)
ScanPlan table: ‘a’
skip-‐scan: pk0 = ‘x’ projects: pk0, c3
HashJoinPlan types {inner} join-‐keys: {$1} projects: $2, $0
Build hash-‐key: $1
Phoenix Implementor
PhoenixTableScan table: ‘a’
filter: $0 = ‘x’
PhoenixServerJoin type: inner
cond: $3 = $1
PhoenixServerProject projects: $2, $0
PhoenixTableScan table: ‘b’
PhoenixServerProject projects: $0, $2
PhoenixServerProject projects: $0, $3
ScanPlan table: ‘b’
projects: col0, col2
Probe
![Page 37: Apache Big Data EU 2015 - Phoenix](https://reader034.fdocuments.us/reader034/viewer/2022042907/5882153f1a28ab3f4c8b5549/html5/thumbnails/37.jpg)
Interoperability Example o Joining data from Phoenix and mySQL
EnumerableJoin
PhoenixTableScan JdbcTableScan
Phoenix Tables over HBase mySQL Database
PhoenixToEnumerable Converter
JdbcToEnumerable Converter
![Page 38: Apache Big Data EU 2015 - Phoenix](https://reader034.fdocuments.us/reader034/viewer/2022042907/5882153f1a28ab3f4c8b5549/html5/thumbnails/38.jpg)
Query Example 1 WITH m AS (SELECT * FROM dept_manager dm WHERE from_date = (SELECT max(from_date) FROM dept_manager dm2 WHERE dm.dept_no = dm2.dept_no)) SELECT m.dept_no, d.dept_name, e.first_name, e.last_name FROM employees e JOIN m ON e.emp_no = m.emp_no JOIN departments d ON d.dept_no = m.dept_no ORDER BY d.dept_no;
![Page 39: Apache Big Data EU 2015 - Phoenix](https://reader034.fdocuments.us/reader034/viewer/2022042907/5882153f1a28ab3f4c8b5549/html5/thumbnails/39.jpg)
Query Example 2
SELECT dept_no, Qtle, count(*) FROM Qtles t JOIN dept_emp de ON t.emp_no = de.emp_no WHERE dept_no <= 'd006' GROUP BY rollup(dept_no, *tle) ORDER BY dept_no, Qtle;
![Page 40: Apache Big Data EU 2015 - Phoenix](https://reader034.fdocuments.us/reader034/viewer/2022042907/5882153f1a28ab3f4c8b5549/html5/thumbnails/40.jpg)
V5
Thanks!
@ApachePhoenix h0p://phoenix.apache.org
2015-‐09-‐29
Nick Dimiduk (@xefyr) h0p://n10k.com
#apachebigdata