The top five questions to ask about NoSQL. JONATHAN ELLIS at Big Data Spain 2012

Five questionsfor your NoSQL solution!Jonathan EllisCTO, DataStaxProject Chair, Apache Cassandra

how do I

modelmy application?

Popular options• Key/value

• Tabular

• Document

• Graph?

Schema is your friend

{ "id": "e451dd42-ece3-11e1-a0a3-34159e154f4c", "name": "jbellis", "state": "TX", "birthdate": "1/1/1976", "email_addresses": ["jbellis@gmail", "jbellis@datastax.com"],}

SQL can be your friend too

CREATE TABLE users ( id uuid PRIMARY KEY, name text, state text, birth_date date);

CREATE INDEX ON users(state);

SELECT * FROM usersWHERE state=‘Texas’ AND birth_date > ‘1950-01-01’;

CREATE TABLE users_addresses ( user_id uuid REFERENCES users, email text);

SELECT *FROM users NATURAL JOIN users_addresses;

Collections

CREATE TABLE users_addresses ( user_id uuid REFERENCES users, email text);

SELECT *FROM users NATURAL JOIN users_addresses;

Collections

CREATE TABLE users ( id uuid PRIMARY KEY, name text, state text, birth_date date, email_addresses set<text>);

UPDATE usersSET email_addresses = email_addresses + {‘jbellis@gmail.com’, ‘jbellis@datastax.com’};

Collections

Joins don’t scale• No joins

• No subqueries

• No aggregation functions* or GROUP BY

• ORDER BY?

SELECT * FROM tweetsWHERE user_id IN (SELECT follower FROM followers WHERE user_id = ’driftx’)

followers

tweets

CREATE TABLE timeline ( user_id uuid, tweet_id timeuuid, tweet_author uuid, tweet_body text, PRIMARY KEY (user_id, tweet_id));

Clustering in Cassandrauser_id tweet_id _author _body

jbellis 3290f9da.. rbranson loremjbellis 3895411a.. tjake ipsum

... ... ...

driftx 3290f9da.. rbranson loremdriftx 71b46a84.. yzhang dolor

... ... ...

yukim 3290f9da.. rbranson loremyukim e451dd42.. tjake amet

... ... ...

CREATE TABLE timeline ( user_id uuid, tweet_id timeuuid, tweet_author uuid, tweet_body text, PRIMARY KEY (user_id, tweet_id));

Clustering in Cassandrauser_id tweet_id _author _body

jbellis 3290f9da.. rbranson loremjbellis 3895411a.. tjake ipsum

... ... ...

driftx 3290f9da.. rbranson loremdriftx 71b46a84.. yzhang dolor

... ... ...

yukim 3290f9da.. rbranson loremyukim e451dd42.. tjake amet

... ... ...

SELECT * FROM timelineWHERE user_id = ’driftx’;

how does it

perform?

VLDB benchmark

Locking

Efficiency

UPDATE usersSET email_addresses = email_addresses + {...}WHERE user_id = ‘jbellis’;

Durability

Log-structured storage engine

Memory

Hard drive

Memtable

write( , )k1 c1:v1

Commit log

Memory

Hard drive

Memtable

write( , )k1 c1:v1

Commit log

k1 c1:v1

Memory

Hard drive

write( , )k1 c2:v2

k1 c1:v1

k1 c2:v2

Memory

Hard drive

k1 c1:v1

k1 c2:v2

write( , )k2 c1:v1 c2:v2

k2 c1:v1 c2:v2

Memory

Hard drive

k1 c1:v1

k1 c1:v4

k1 c2:v2

write( , )k1 c1:v4 c3:v3

k2 c1:v1 c2:v2

k1 c1:v4 c3:v3

Memory

Hard drive

SSTable

k1 c1:v4 c2:v2

k2 c1:v1 c2:v2

index / BF

cleanup

No random writes

The gory details

Larger than memory datasets

how does it handle

failure?

Classic partitioning with SPOFpartition 1 partition 2 partition 3 partition 4

router

client

Availability• “High availability implies that a single fault will

not bring down your system. Not ‘we’ll recover quickly.’” -- Ben Coverston: DataStax

• “The biggest problem with failover is that you're almost never using it until it really hurts. It's like backups that you never test.” -- Rick Branson: Instagram

Fully distributed, no SPOFclient

Multiple datacenters

Self-healing

Client

request

Coordinator

Replica

internalrequest

internalresponse

response

Self-healing

Client

request

Coordinator

Replica

internalrequest

internalresponse

response

Self-healing

Client

request

Coordinator

Replica

internalrequest

replica fails

timeoutresponse 4

Self-healing

Client

request

Coordinator

Replica

internalrequest

Xreplica fails

timeoutresponse 4

Self-healing

Client

request

Coordinator

Replica

internalrequest

replica fails

timeoutresponse

hint 3

Self-healing

Client

request

Coordinator

Replica

internalrequest

Xreplica fails

timeoutresponse

hint 3

Other healing modes• AntiEntropyService

• Read repair

Dynamic snitch(dealing with partial failure)

Client Coordinator

40% busy

90% busy

30% busy

how does itscale?

VLDB benchmark

Scaling antipatterns• Metadata servers

• Router bottlenecks

• Overloading existing nodes when adding capacity

flexibleis it?

Data model: Realtime

Portfolios

StockHist

stock lastGOOG $95.52AAPL $186.10AMZN $112.98

LiveStocks

stock date priceGOOG 2011-01-01 $8.23GOOG 2011-01-02 $6.14GOOG 2011-001-03 $7.78

user stock sharesjbellis GOOG 80jbellis LNKD 20yukim AMZN 100

Data model: Analytics

worst_date loss2011-07-23 -$34.812011-03-11 -$11432.242011-05-21 -$1476.93

Portfolio1

HistLoss

Portfolio2Portfolio3

Data model: Analyticsstock rdate returnGOOG 2011-07-25 $8.23GOOG 2011-07-24 $6.14GOOG 2011-07-23 $7.78AAPL 2011-07-25 $15.32AAPL 2011-07-24 $12.68

10dayreturns

INSERT OVERWRITE TABLE 10dayreturnsSELECT a.stock, b.date as rdate, b.price - a.priceFROM StockHist a JOIN StockHist b ON (a.stock = b.stock AND date_add(a.date, 10) = b.date);

portfolio rdate preturnPortfolio1 2011-07-25 $118.21Portfolio1 2011-07-24 $60.78Portfolio1 2011-07-23 -$34.81Portfolio2 2011-07-25 $2143.92Portfolio3 2011-07-24 -$10.19

portfolio_returns

INSERT OVERWRITE TABLE portfolio_returnsSELECT portfolio, rdate, SUM(b.return)FROM portfolios a JOIN 10dayreturns b ON (a.stock = b.stock)GROUP BY portfolio, rdate;

INSERT OVERWRITE TABLE HistLossSELECT a.portfolio, rdate, minpFROM ( SELECT portfolio, min(preturn) as minp FROM portfolio_returns GROUP BY portfolio) a JOIN portfolio_returns b ON (a.portfolio = b.portfolio and a.minp = b.preturn);

worst_date loss2011-07-23 -$34.812011-03-11 -$11432.242011-05-21 -$1476.93

Portfolio1

HistLoss

Portfolio2Portfolio3

Some Cassandra users

Questions?

• http://www.flickr.com/photos/26817893@N05/2573006312/

• http://www.flickr.com/photos/rowanbank/7686239548

• http://www.flickr.com/photos/mervtheswerve/6081933265

• http://www.flickr.com/photos/dg_pics/2526208830

• http://www.flickr.com/photos/wainwright/351684037

• http://www.flickr.com/photos/mikeneilson/1606662529

• http://www.flickr.com/photos/sbisson/3852905534

• http://www.flickr.com/photos/breadnbadger/2674928517

Image credits

The top five questions to ask about NoSQL. JONATHAN ELLIS at Big Data Spain 2012

Technology

Transcript of The top five questions to ask about NoSQL. JONATHAN ELLIS at Big Data Spain 2012

INTRODUCTION TO MONGODB...INTRODUCTION TO MONGODB Intro to Databases / NOSQL INTUITION NoSQL vs. SQL SQL NoSQL NoSQL vs. SQL SQL NoSQL Table Based Documents, Key-Value pairs, Graph-based,

Linguistic Resources for the 2013 TAC KBP Entity Linking Evaluation Joe Ellis (presenter), Justin Mott, Xuansong Li, Jeremy Getman, Jonathan Wright, Stephanie.

Who am I?assets.astrails.com/.../wtf-is-mysql.pdf · Twitter Rackspace Digg Everybody LinkedIn Wednesday, June 16, 2010. NoSQL NoSQL NoSQL NoSQL NoSQL NoSQL NoSQL NoSQL NoSQL NoSQL

NoSQL: Graph Databases. Databases Why NoSQL Databases?

SunRayce Front Suspension Analysis Jonathan Walker Lars Moravy Ian Harrison Alexander Ellis ME 224 December 12, 2001.

May God bless you all, Jonathan, Misty, Brianna, Caleb ...bethanyindependentba.ipage.com/ellisfamilybluegrass/...May God bless you all, The Ellis Family Jonathan, Misty, Brianna, Caleb,

Contents · 2014-09-01 · Demelza Hospice, Eltham Peter Ellis, Chief Executive, Richard House, London Susan Hay, Chair, Adam’s Hats Help the Hospices: Jonathan Ellis, Director

SQL vs NoSQL: The NoSQL way

Graph Theory Aiding DNA Fragment Assembly Jonathan Kaptcianos e-mail: jkaptcianos@smcvt.edujkaptcianos@smcvt.edu advisor: Professor Jo Ellis-Monaghan Work.

James H.M. Sprayregen Jonathan S. Henes Christopher T. Greco KIRKLAND & ELLIS ... · James H.M. Sprayregen Jonathan S. Henes Christopher T. Greco KIRKLAND & ELLIS LLP KIRKLAND & ELLIS

NoSQL Now! Webinar Series: Innovations in NoSQL Query Languages

ENGINEERING ADVENTURE Bloodhound Education Experience Jonathan Ellis 2 October, 2015.

Bringing SQL to NoSQL: Rich, Declarative Query for NoSQL

Consistent NoSQL data storage with ModeShape (NoSQL Matters 2013)

Interim Report: MXe/MX3D proposalarchive.synchrotron.org.au/images/asdp/bsg_final_report...Simon Williams University of Queensland Jonathan Ellis University

NoSQL and Big Data Analytics at NOSQL NOW! 2013

CS 401R NoSQL Database Report Jon Belyeu Fall 2015bigdata.cs.byu.edu/wp-content/uploads/2015/11/jonathan-belyeu.pdf · CS 401R NoSQL Database Report Jon Belyeu Introduction This report

Oracle NoSQL Database – A Distributed Key-Value Store · HPTS, October 24, 2011 Agenda • Oracle and NoSQL • Oracle NoSQL Database Architecture • Oracle NoSQL Database Technical

State of Cassandra, 2012 - NoSQL | Apache Cassandra · State of Cassandra, 2012 Jonathan Ellis Project Chair, Apache Cassandra CTO, DataStax @spyced ©2012 DataStax Some Cassandra

SQLAlchemy and PostgreSQL presented by Jason Kirtland and Jonathan Ellis