James Taylor jtaylor@salesforce

12
Phoenix James Taylor [email protected] We put the SQL back in the NoSQL

description

James Taylor [email protected]. Phoenix. We put the SQL back in the NoSQL. Agenda. Phoenix Overview Phoenix Implementation Performance Analysis Phoenix Roadmap Demo. Completed. Phoenix Overview. SQL layer on top of HBase Delivered as a embedded JDBC driver - PowerPoint PPT Presentation

Transcript of James Taylor jtaylor@salesforce

Page 1: James Taylor jtaylor@salesforce

PhoenixJames [email protected]

We put the SQL back in the NoSQL

Page 2: James Taylor jtaylor@salesforce

Agenda

Completed

Phoenix OverviewPhoenix ImplementationPerformance AnalysisPhoenix RoadmapDemo

Page 3: James Taylor jtaylor@salesforce

Phoenix Overview

Completed

SQL layer on top of HBaseDelivered as a embedded JDBC driverTargeting low latency queries over HBase dataColumns modeled as multi-part row key and key valuesQuery engine transforms SQL into series of scansUsing native HBase APIs and capabilities

Coprocessors for aggregationCustom filters for expression evaluationTransaction isolation through scan time rangeOptionally client-controlled timestamps

Open sourcing soon100% Java

Page 4: James Taylor jtaylor@salesforce

Phoenix SQL Support

SELECT <expression>… FROM <table>WHERE <expression>GROUP BY <expression>…HAVING <aggregate expression>ORDER BY <aggregate expression>…LIMIT <value>

Aggregation Functions MIN, MAX, AVG, SUM, COUNT

Built-in Functions SUBSTR, ROUND, TRUNC, TO_CHAR, TO_DATE

Operators =,!=,<>,<,<=,>,>=, LIKE AND, OR, NOT

Bind Parameters ?, :#

CASE WHENIN (<value>…)DDL/DML (in progress)

CREATE/DROP <table> DELETE FROM <table> WHERE <expression> UPSERT INTO <table> [(<column>…)]

VALUES (<value>…)

Page 5: James Taylor jtaylor@salesforce

Sample Queries

CompletedSELECT host, TRUNC(dateTime, 'DAY'), AVG(cache_hit), MIN(cache_hit), MAX(cache_hit)FROM server_metricsWHERE host LIKE 'cs11-%' AND dateTime> TO_DATE('2012-04-01') AND dateTime< TO_DATE('2012-07-01')GROUP BY host, TRUNC(dateTime, 'DAY')HAVING MIN(cache_hit) < 90ORDER BY host, AVG(cache_hit)

SELECT product_number, product_name, CASE WHEN list_price = 0 THEN 'Mfg item - not for resale' WHEN list_price < 50 THEN 'Under $50' WHEN list_price >= 50 and list_price < 250 THEN 'Under $250' WHEN list_price >= 250 and list_price < 1000 THEN 'Under $1000' ELSE 'Over $1000' END as price_categoryFROM product_catalogueWHERE product_category IN ('Camping', 'Hiking’)AND (product_name LIKE '%Pack’ OR product_name LIKE '% Cots %’)

Page 6: James Taylor jtaylor@salesforce

Query Processing

FEATURERow Key

Key Values

ORG_ID DATE

TXNS

IO_TIME

RESPONSE_TIME

Product Metrics HTable

Scan Start key: ORG_ID (:1) + DATE (:2) End key: ORG_ID (:1) + DATE (:3)

Filter Filter: IO_TIME > 100

Aggregation Intercepts scan on region server Builds map of distinct FEATURE values Returns one row per distinct group Client does final merge

SELECT feature, SUM(txns)FROM product_metricsWHERE org_id = :1AND date >= :2 AND date <= :3AND io_time > 100GROUP BY feature

Page 7: James Taylor jtaylor@salesforce

Phoenix Query Optimizations

Completed

Start/stop key of scan based on AND-ed columnsThrough SUBSTR, ROUND, TRUNC, LIKE

Parallelized on client by chunking over start/stop key of scanAggregation on region-servers through coprocessor

Inline for GROUP BY over row key ordered columnsIn memory map per group otherwise

WHERE clause executed through custom filtersIncremental evaluation with early terminationEvaluated through byte pointers

IN and OR over same column (in progress)Becomes batched get or filter with next row hint

Top N queries (future)Through coprocessor keeping top N rows

TABLESAMPLE (future)Becomes filter with next row hint

Page 8: James Taylor jtaylor@salesforce

Phoenix Performance

Page 9: James Taylor jtaylor@salesforce

Phoenix Performance

Completed

Page 10: James Taylor jtaylor@salesforce

Phoenix Roadmap

Completed

Increase breadth of SQL supportDML/DDL (in progress)Derived tables (SELECT * FROM (SELECT foo FROM bar))More built-in functions: COALESCE, UPPER, TRIM More operators: ||, IS NULL, *,/,+,-

Secondary indexesMultiple projections for immutable data

Reordered columns in row keyDifferent levels of aggregation

Incrementally maintained for non immutable dataTABLESAMPLE for samplingImprove multi-byte supportJoins

Hash joinOLAP extensions

OVERPARTITION BY

Page 11: James Taylor jtaylor@salesforce

Demo

Completed

Time-series database charting

http://goo.gl/61WRs

Page 12: James Taylor jtaylor@salesforce

Thank you!Questions/comments?