Cb15 presentation-yingyi
-
Upload
yingyi-bu -
Category
Data & Analytics
-
view
188 -
download
0
Transcript of Cb15 presentation-yingyi
BIG DATA QUERY LANDSCAPE – N1QL AND MORE
Yingyi Bu | Couchbase
©2015 Couchbase Inc. 2
About Myself
Sr. Software Engineer @ Couchbase
Committer @ AsterixDB
(Research Project under Apache Incubation)
PhD Student @ UC Irvine
N1QL SQL++
@buyingyi
©2015 Couchbase Inc. 3
Agenda
Introduction
Operational Query Processing
Analytical Query Processing
Comparison and Unification
Summary
Introduction
©2015 Couchbase Inc. 5
Research
Projects
Introduction
NoSQL
SQL-on-Hadoop
SQL++
Unification
©2015 Couchbase Inc. 6
Language Unification Research SQL Backward Compatible
Rich Data Model
Configurable Semantics
System Unification Research A Single Language Interface
Scale-out for Both Workloads
Resource Scheduling Underneath
Introduction
SQL++
Operational Query Processing
©2015 Couchbase Inc. 8
ArrayList<URI> nodes = new ArrayList<URI>();
// Add one or more nodes of your clusternodes.add(URI.create("http://127.0.0.1:8091/pools"));
// Try to connect to the clientCouchbaseClient client = null;try {
client = new CouchbaseClient(nodes, "default", "");} catch (Exception e) {
System.err.println("Error connecting to Couchbase: " + e.getMessage());
System.exit(1);}
// Put the key-value pair into Couchbase.client.set("hello", "couchbase!").get();
// Return the result and cast it to stringString result = (String) client.get("hello");System.out.println(result);
Operational Query Processing
Put
Get
JSON
Filtering
Flatten
Group-by
Aggregation
Join
Ordering
©2015 Couchbase Inc. 9
N1QL – SQL for NoSQL
Nested Data
Heterogeneous Data
Dynamic typing[
{ "beer-sample": {
"brewery_id": "bro""abv": {"m1":1, "m2“:2},"category": "North American Lager”,
"type": "beer"}
},{
"beer-sample": {"abv": 9.5,"brewery_id": "brouwerij"}
}]
SELECT
category, type, abv.m1
FROM `beer-sample`
WHERE type = “beer”
[{
"category": "North American Lager",
"type": "beer”,"m1": 1
}]
Standard SELECT pipeline
Joins, subqueries, set operators
UNNEST and NEST
©2015 Couchbase Inc. 10
Cassandra
SQL-like query language
Feature N1QL Cassandra
Lookup ✔ ✔
Filtering ✔ ✔
Ordering ✔ ✔
Aggregation ✔ ✖
Join ✔ ✖
Subqueries ✔ ✖
Unnest ✔ ✖
Schema-free ✔ ✖
SELECT firstname, lastname FROM users WHERE birth_year = 1981 AND country = 'FR' ALLOW FILTERING;
SELECT * FROM posts WHERE userid='john doe' AND (blog_title, posted_at) > ('John''sBlog', '2012-01-01')
©2015 Couchbase Inc. 11
MongoDB
JavaScript-like language
Feature N1QL MongoDB
Lookup ✔ ✔
Filtering ✔ ✔
Ordering ✔ ✔
Aggregation ✔ ✔
Join ✔ ✖
Subqueries ✔ ✖
Unnest ✔ ✔
Schema-free ✔ ✔
db.sales.aggregate([
{$group : {
_id : { month: { $month: "$date" }, day: { $dayOfMonth: "$date" }, year: { $year: "$date" } },
totalPrice: { $sum: { $multiply: [ "$price", "$quantity" ] } },
averageQuantity: { $avg: "$quantity" },count: { $sum: 1 }
}}
])
db.users.find( { age: { $gt: 18 } }, { name: 1, address: 1 } ).limit(5)
Analytical Query Processing
©2015 Couchbase Inc. 13
Hive
INSERT OVERWRITE TABLE school_summary
SELECT subq1.school, COUNT(1)
FROM (SELECT a.status, b.school, b.gender
FROM status_updates a JOIN profiles b
ON (a.userid = b.userid
AND a.ds='2009-03-20' )) subq1
GROUP BY subq1.school
ProjectProject
Scan (a)
FilterScan (b)
ReduceSink ReduceSink
Join
Group-by
FileSink
Scan
ReduceSink
Group-by
FileSink
M1
R1
M2
R2 More data types than SQL
Hadoop or Tez as runtime
©2015 Couchbase Inc. 14
Impala
INSERT OVERWRITE TABLE
school_summary
SELECT subq1.school, COUNT(1)
FROM (SELECT a.status, b.school, b.gender
FROM status_updates a JOIN profiles b
ON (a.userid = b.userid
AND a.ds='2009-03-20' )) subq1
GROUP BY subq1.school
ProjectProject
Filter HDFS Scan (b)
Hash Join
HDFS Scan (a)
Pre-Agg
Merge-Agg
HDFS Write
ANSI SQL-92
HDFS/HBase as the storage
Native MPP execution engine
©2015 Couchbase Inc. 15
Spark SQL
ctx = new HiveContext()users = ctx.table("users")young = users.where(users("age") < 21) println(young.count())
SELECT count(*) FROM users
where age < 21
SQL DataFrames
SQL
DataFrames
Unresolved Logical Plan
Logical Plan
PhysicalPlans
SelectedPhysicalPlan R
DD
s
Co
st M
od
el
Catalog
©2015 Couchbase Inc. 16
Drill
ANSI SQL-92
Nested Data
Schema Inference
Centralized schema
Static
Managed by DBAs
Self-describing or schema-less
Dynamic evolving
Managed by applications
Embedded in data
CSV, JSON, Parquet, ORC
Comparison and Unification
©2015 Couchbase Inc. 18
Comparison and Unification
AsterixDB – System Unification Research
Query language?
Language Comparisons
SQL++ – Language Unification Research
N1QL and SQL++
SQL++
Unification
Research
Projects
©2015 Couchbase Inc. 19
NoSQL data model with schema flexibility
Declarative full-fledged query language (AQL)
Partitioned native LSM-based storage
Secondary index (B-Tree, R-Tree, and keyword index)
Single-row transaction
Spatial/temporal data types
External data (HDFS) access and indexing
Native MPP query execution engine
AsterixDB (Apache incubator)
Operational
Analytical
©2015 Couchbase Inc. 20
Query Language?
SELECT subq1.school, COUNT(1)
FROM (SELECT a.status, a.date, b.school, b.region
FROM status_updates a JOIN profiles b
ON (a.userid = b.userid
AND a.date='2009-03-20' )) subq1
GROUP BY subq1.school
Relational JSON
Nested tuples/collections
Partial/missing schema
Heterogeneity
Complex values
Replace COUNT(1) with
“(select * from subq1 order by date limit 3)”;
“school” is not in the schema of the “profiles” table
“school” is missing in some profiles;
“school” is a nested tuple.
©2015 Couchbase Inc. 21
Language Comparison: Data Model
SystemTop-level
ValuesHeterogeneity Arrays Bags Maps
NestedTuples
Primitive Values
Hive Bags/Tuples ✖ ✔ ✖ P ✔ ✔
Impala Bags/Tuples ✖ ✖ ✖ ✖ ✖ ✔
Spark SQL Bags/Tuples ✖ ✔ ✖ ✔ ✔ ✔
Drill Bags/Tuples ✖ ✔ ✖ ✔ ✔ ✔
N1QL Bags/Tuples ✔ ✔ ✖ ✖ ✔ ✔
Cassandra Bags/Tuples ✖ P ✖ P ✖ ✔
MongoDB Bags/Tuples ✔ ✔ ✖ ✖ ✔ ✔
AsterixDB Any Values ✔ ✖ ✔ ✖ ✔ ✔
©2015 Couchbase Inc. 22
Language Comparison: Types
SystemDynamic
Type CheckStatic
Type CheckAny Type Open Type Union Type Optional
Hive ✖ ✔ ✖ ✖ ✖ ✖
Impala ✖ ✔ ✖ ✖ ✖ ✖
Spark SQL ✖ ✔ ✖ ✖ ✖ ✖
Drill ✖ ✔ ✖ ✖ ✖ ✖
N1QL ✔ ✖ – –
Cassandra ✖ ✔ ✖ ✖ ✖ ✖
MongoDB ✔ ✖ – –
AsterixDB ✔ ✔ ✔ ✔ ✖ ✔
©2015 Couchbase Inc. 23
Language Comparison: Path Navigation
SystemTuple Nav.
absentTuple Nav. mismatch
Array Nav. absent
Array Nav. mismatch
Map Nav. absent
Map Nav. mismatch
Hive error error null error null error
Impala error error -- -- -- --
Spark SQL error error error error null error
Drill error error error error null error
N1QL missing missing missing missing -- --
Cassandra error error -- -- -- --
MongoDB missing missing -- -- -- --
AsterixDB null error error error -- --
No Errors!
©2015 Couchbase Inc. 24
Language Comparison: SELECT Clause
SystemProject Tuples
with Non-scalarSubqueries
Project Tupleswith NestedCollections
Project Non-Tuples
Hive ✖ ✔ ✖
Impala ✖ ✖ ✖
Spark SQL ✖ ✔ ✖
Drill ✖ ✔ ✖
N1QL ✔ ✔ ✔
Cassandra ✖ ✖ ✖
MongoDB ✖ ✔ ✔
AsterixDB ✔ ✔ ✔
©2015 Couchbase Inc. 25
Language Comparison: FROM Clause
System Subquery JoinsInner
UnnestOuter
UnnestOrdinal
Positions
Hive ✔ ✔ ✔ ✔ ✔
Impala ✔ ✔ ✖ ✖ ✖
Spark SQL ✔ ✔ ✖ ✖ ✖
Drill ✔ ✔ ✔ ✖ ✖
N1QL ✔ ✔ ✔ ✔ ✖
Cassandra ✖ ✖ ✖ ✖ ✖
MongoDB ✖ ✖ ✔ ✖ ✖
AsterixDB ✔ ✔ ✔ ✖ ✔
©2015 Couchbase Inc. 26
JSON data model
INNER/OUTER FLATTEN CLAUSE
Arbitrary subqueries in SELECT
Configurable parameters for semantics Path navigations
Equality evaluations
Collection coercions
SQL++ (The “++” Part)
Supported by N1QL!
Made consistent in N1QL!
©2015 Couchbase Inc. 27
SQL++ Configuration for N1QL
Configuration Parameter Value Parameter Value
@path
tuple_nav.absent missing tuple_nav.type_mismatch missing
array_nav.absent missing array_nav.type_mismatch missing
map_nav.absent missing map_nav.type_mismatch missing
@eq
complex yes type_mismatch false
null_eq_null null null_eq_value null
null_eq_missing missing missing_eq_missing missing
missing_eq_value missing null_and_missing missing
null_and_true null null_and_null null
missing_and_true missing missing_and_missing missing
Summary
N1QL in a Bigger Context
©2015 Couchbase Inc. 29
Operational Query Processing Rich Data Model
SQL is BACK, but with EXTENSIONS!
Analytical Query Processing Rich Data Model is a MUST!
Unification The trend!
Summary
Thank you.Q & A