The top five questions to ask about NoSQL. JONATHAN ELLIS at Big Data Spain 2012

Post on 26-Jan-2015

106 views 0 download

Tags:

description

Session presented at Big Data Spain 2012 Conference 16th Nov 2012 ETSI Telecomunicacion UPM Madrid www.bigdataspain.org More info: http://www.bigdataspain.org/es-2012/conference/top-five-questions-about-nosql/jonathan-ellis

Transcript of The top five questions to ask about NoSQL. JONATHAN ELLIS at Big Data Spain 2012

Five questionsfor your NoSQL solution!Jonathan EllisCTO, DataStaxProject Chair, Apache Cassandra

©2012 DataStax

how do I

modelmy application?

©2012 DataStax

Popular options• Key/value

• Tabular

• Document

• Graph?

©2012 DataStax

Schema is your friend

{ "id": "e451dd42-ece3-11e1-a0a3-34159e154f4c", "name": "jbellis", "state": "TX", "birthdate": "1/1/1976", "email_addresses": ["jbellis@gmail", "jbellis@datastax.com"],}

©2012 DataStax

SQL can be your friend too

CREATE TABLE users ( id uuid PRIMARY KEY, name text, state text, birth_date date);

CREATE INDEX ON users(state);

SELECT * FROM usersWHERE state=‘Texas’ AND birth_date > ‘1950-01-01’;

©2012 DataStax

CREATE TABLE users ( id uuid PRIMARY KEY, name text, state text, birth_date date);

CREATE TABLE users_addresses ( user_id uuid REFERENCES users, email text);

SELECT *FROM users NATURAL JOIN users_addresses;

Collections

©2012 DataStax

CREATE TABLE users ( id uuid PRIMARY KEY, name text, state text, birth_date date);

CREATE TABLE users_addresses ( user_id uuid REFERENCES users, email text);

SELECT *FROM users NATURAL JOIN users_addresses;

Collections

X

©2012 DataStax

CREATE TABLE users ( id uuid PRIMARY KEY, name text, state text, birth_date date, email_addresses set<text>);

UPDATE usersSET email_addresses = email_addresses + {‘jbellis@gmail.com’, ‘jbellis@datastax.com’};

Collections

©2012 DataStax

Joins don’t scale• No joins

• No subqueries

• No aggregation functions* or GROUP BY

• ORDER BY?

©2012 DataStax

SELECT * FROM tweetsWHERE user_id IN (SELECT follower FROM followers WHERE user_id = ’driftx’)

followers

?

tweets

©2012 DataStax

CREATE TABLE timeline (  user_id uuid,  tweet_id timeuuid,  tweet_author uuid, tweet_body text,  PRIMARY KEY (user_id, tweet_id));

Clustering in Cassandrauser_id tweet_id _author _body

jbellis 3290f9da.. rbranson loremjbellis 3895411a.. tjake ipsum

... ... ...

driftx 3290f9da.. rbranson loremdriftx 71b46a84.. yzhang dolor

... ... ...

yukim 3290f9da.. rbranson loremyukim e451dd42.. tjake amet

... ... ...

©2012 DataStax

CREATE TABLE timeline (  user_id uuid,  tweet_id timeuuid,  tweet_author uuid, tweet_body text,  PRIMARY KEY (user_id, tweet_id));

Clustering in Cassandrauser_id tweet_id _author _body

jbellis 3290f9da.. rbranson loremjbellis 3895411a.. tjake ipsum

... ... ...

driftx 3290f9da.. rbranson loremdriftx 71b46a84.. yzhang dolor

... ... ...

yukim 3290f9da.. rbranson loremyukim e451dd42.. tjake amet

... ... ...

SELECT * FROM timelineWHERE user_id = ’driftx’;

©2012 DataStax

how does it

perform?

©2012 DataStax

VLDB benchmark

©2012 DataStax

Locking

©2012 DataStax

Efficiency

©2012 DataStax

UPDATE usersSET email_addresses = email_addresses + {...}WHERE user_id = ‘jbellis’;

©2012 DataStax

Durability

©2012 DataStax

Log-structured storage engine

Memory

Hard drive

Memtable

write( , )k1 c1:v1

Commit log

©2012 DataStax

Memory

Hard drive

Memtable

write( , )k1 c1:v1

Commit log

k1 c1:v1

k1 c1:v1

©2012 DataStax

Memory

Hard drive

write( , )k1 c2:v2

k1 c1:v1

k1 c1:v1

k1 c2:v2

c2:v2

©2012 DataStax

Memory

Hard drive

k1 c1:v1

k1 c1:v1

k1 c2:v2

c2:v2

write( , )k2 c1:v1 c2:v2

k2 c1:v1 c2:v2

k2 c1:v1 c2:v2

©2012 DataStax

Memory

Hard drive

k1 c1:v1

k1 c1:v4

k1 c2:v2

c2:v2

write( , )k1 c1:v4 c3:v3

k2 c1:v1 c2:v2

k2 c1:v1 c2:v2

k1 c1:v4 c3:v3

c3:v3

©2012 DataStax

Memory

Hard drive

SSTable

flush

k1 c1:v4 c2:v2

k2 c1:v1 c2:v2

c3:v3

index / BF

cleanup

©2012 DataStax

No random writes

©2012 DataStax

The gory details

©2012 DataStax

Larger than memory datasets

©2012 DataStax

how does it handle

failure?

©2012 DataStax

Classic partitioning with SPOFpartition 1 partition 2 partition 3 partition 4

router

client

©2012 DataStax

Availability• “High availability implies that a single fault will

not bring down your system. Not ‘we’ll recover quickly.’” -- Ben Coverston: DataStax

• “The biggest problem with failover is that you're almost never using it until it really hurts. It's like backups that you never test.” -- Rick Branson: Instagram

©2012 DataStax

Fully distributed, no SPOFclient

p1

p1

p1p3

p6

©2012 DataStax

Multiple datacenters

©2012 DataStax

©2012 DataStax

Self-healing

Client

request

Coordinator

Replica

internalrequest

internalresponse

response

1

2

3

4

©2012 DataStax

Self-healing

Client

request

Coordinator

Replica

internalrequest

internalresponse

response

1

2

3

4

©2012 DataStax

Self-healing

Client

request

Coordinator

Replica

internalrequest

1

2

replica fails

timeoutresponse 4

©2012 DataStax

Self-healing

Client

request

Coordinator

Replica

internalrequest

1

2

Xreplica fails

timeoutresponse 4

©2012 DataStax

Self-healing

Client

request

Coordinator

Replica

internalrequest

1

2

4

replica fails

timeoutresponse

hint 3

©2012 DataStax

Self-healing

Client

request

Coordinator

Replica

internalrequest

1

2

4

Xreplica fails

timeoutresponse

hint 3

©2012 DataStax

Other healing modes• AntiEntropyService

• Read repair

©2012 DataStax

Dynamic snitch(dealing with partial failure)

Client Coordinator

40% busy

90% busy

30% busy

©2012 DataStax

how does itscale?

©2012 DataStax

VLDB benchmark

©2012 DataStax

Scaling antipatterns• Metadata servers

• Router bottlenecks

• Overloading existing nodes when adding capacity

©2012 DataStax

how

flexibleis it?

©2012 DataStax

©2012 DataStax

Data model: Realtime

Portfolios

StockHist

stock lastGOOG $95.52AAPL $186.10AMZN $112.98

LiveStocks

stock date priceGOOG 2011-01-01 $8.23GOOG 2011-01-02 $6.14GOOG 2011-001-03 $7.78

user stock sharesjbellis GOOG 80jbellis LNKD 20yukim AMZN 100

©2012 DataStax

Data model: Analytics

worst_date loss2011-07-23 -$34.812011-03-11 -$11432.242011-05-21 -$1476.93

Portfolio1

HistLoss

Portfolio2Portfolio3

©2012 DataStax

Data model: Analyticsstock rdate returnGOOG 2011-07-25 $8.23GOOG 2011-07-24 $6.14GOOG 2011-07-23 $7.78AAPL 2011-07-25 $15.32AAPL 2011-07-24 $12.68

10dayreturns

INSERT OVERWRITE TABLE 10dayreturnsSELECT a.stock, b.date as rdate, b.price - a.priceFROM StockHist a JOIN StockHist b ON (a.stock = b.stock AND date_add(a.date, 10) = b.date);

©2012 DataStax

Data model: Analytics

portfolio rdate preturnPortfolio1 2011-07-25 $118.21Portfolio1 2011-07-24 $60.78Portfolio1 2011-07-23 -$34.81Portfolio2 2011-07-25 $2143.92Portfolio3 2011-07-24 -$10.19

portfolio_returns

INSERT OVERWRITE TABLE portfolio_returnsSELECT portfolio, rdate, SUM(b.return)FROM portfolios a JOIN 10dayreturns b ON (a.stock = b.stock)GROUP BY portfolio, rdate;

©2012 DataStax

Data model: Analytics

INSERT OVERWRITE TABLE HistLossSELECT a.portfolio, rdate, minpFROM ( SELECT portfolio, min(preturn) as minp FROM portfolio_returns GROUP BY portfolio) a JOIN portfolio_returns b ON (a.portfolio = b.portfolio and a.minp = b.preturn);

worst_date loss2011-07-23 -$34.812011-03-11 -$11432.242011-05-21 -$1476.93

Portfolio1

HistLoss

Portfolio2Portfolio3

©2012 DataStax

©2012 DataStax

Some Cassandra users