State of Cassandra, 2012 - NoSQL | Apache Cassandra · State of Cassandra, 2012 Jonathan Ellis...

Post on 04-Oct-2018

224 views 0 download

Transcript of State of Cassandra, 2012 - NoSQL | Apache Cassandra · State of Cassandra, 2012 Jonathan Ellis...

State of Cassandra, 2012Jonathan Ellis Project Chair, Apache Cassandra CTO, DataStax@spyced

©2012 DataStax

Some Cassandra users, early 2011

©2012 DataStax

Some Casandra users, mid 2012

©2012 DataStax

eBay

Application/Use Case• Social Signals: like/want/own features for

eBay product and item pages• Hunch taste graph for eBay users and items• Many time series use cases

Why Cassandra? • Multi-datacenter• Scalable• Write performance• Distributed counters• Hadoop support

ACE

©2012 DataStax

Time series data

©2012 DataStax

Multi-datacenter support

©2012 DataStax

Distributed counters

©2012 DataStax

Hadoop support

©2012 DataStax

Disney

Application/Use Case• Meet the data management needs of user

facing applications across The Walt Disney Company with a single platform

Why Cassandra? • DataStax Enterprise can tackle real-time

and search functions in the same cluster• Scalability• 24x7 uptime

NDI

©2012 DataStax

Multitenancy

3

12

©2012 DataStax

Multitenancy

©2012 DataStax

Enterprise search

©2012 DataStax

SimpleReach

Application/Use Case• SimpleReach tracks social actions for

content creators, from Twitter and Facebook to Pinterest and Reddit, to deliver detailed insights and clear metrics around social behavior.

Why Cassandra? • Very high velocity data ingest rate and

large data volumes• Workload separation between realtime and

batch applications

NDE

©2012 DataStax

SourceNinja

Application/Use Case• SourceNinja notifies you to performance,

security, and bug fixes for the software you depend on

Why Cassandra? • Previous database system could not

handle load; HBase has too many points of failure and was too slow

• Fast real time capabilities, batch analytics on that data, and enterprise search

RDE

©2012 DataStax

Realtime + search + analytics = DataStax Enterprise

©2012 DataStax

Netflix

Application/Use Case• General purpose backend for large scale

highly available cloud based web services supporting Netflix Streaming

Why Cassandra? • Highly available, highly robust and no

schema change downtime• Highly scalable, optimized for SSD• Much lower cost than previous Oracle and

SimpleDB implementations• Flexible data model• Ability to directly influence/implement

OSS feature set• Supports local and wide area distributed

operations, spanning US and Europe

RCE

©2012 DataStax

Optimized for SSD

©2012 DataStax

Open source

©2012 DataStax

• Massively scalable

• High performance

• Reliable/Available

Use case patterns

©2012 DataStax

©2012 DataStax

0

5000

10000

15000

20000

25000

30000

35000

Cassandra 0.6

Cassandra 1.0

reads/s writes/s

©2012 DataStax

©2012 DataStax

Recent Cassandra history• 0.7 (Jan 2011)• CREATE COLUMN FAMILY

• TTL

• Secondary (column) indexes

• 0.8 (Jun 2011)• Counters

• Automatic memtable tuning

• 1.0 (Oct 2011)• Compression

• Leveled compaction

©2012 DataStax

Present• 1.1 (Apr 2012)• Self-tuning row + key caches

• Support for mixed SSD + HDD nodes

• Row-level isolation

©2012 DataStax

Self-tuning Row Cache

25

Client

Merge

SSTables

Client Row Cache

WithoutCache

WithCache

©2012 DataStax

Mixed SSD/HDD Support

26

Client

Cassandra Node

SSDHDD

Cassandra Instance

user_sessions

user_activity

user_sessionsuser_activity

©2012 DataStax

Row Level Isolation

27

Bar

Login

FooFoo

Passwd

BarBar

Login

FooFoo

Passwd

BarFooUPDATE Users

SET login='bar'AND password='bar'WHERE key='e29b-41d4'

SELECT login, passwordFROM UsersWHERE key='e29b-41d4'

Bar, Foo Bar, Bar

Bar

Cassandra 1.0 Cassandra 1.1

©2012 DataStax

ACID

28

©2012 DataStax

Overloading “consistency”• ACID consistency = referential integrity

• Distributed system consistency• {consistency, availability, partition tolerance}

29

©2012 DataStax

Future• 1.2 (Oct 2012?)• Concurrent schema changes

• JBOD support

• Virtual nodes

• CQL3

• Collections

©2012 DataStax

Concurrent Schema Changes

31

CassandraCluster

Client

CREATE TABLE X;...

DROP TABLE X;

Client

CREATE TABLE Y;...

DROP TABLE Y;

©2012 DataStax

JBOD support

HDD2HDD1

Cassandra Instance

HDD3 HDD4

©2012 DataStax

JBOD support

HDD2HDD1

Cassandra Instance

HDD3 HDD4X

©2012 DataStax

Virtual nodes

F

C

B

E

A

D

Ring without vnodes

A

N

K

H

E

JM

Ring with vnodes

C

F

P

B

L

I

O

D

G

©2012 DataStax

Node Rebuild without vnodes

35

F

C

B

E

A

D

Ring without vnodes

A

F E

Node 1 Node 2 Node 3

Node 4 Node 6Node 5

B

A F

C

B A

D

B

E

D C

F

DC E

©2012 DataStax

Node Rebuild with vnodes

36

A

N

K

H

E

JM

Ring with VNodes

C

F

P

B

L

I

O

D

G

B

G

E

K

D J

L

A

O

D H

K F

K G

J F

P

M

I

O

H

B L

F D

E

I

P

A

M C

G N

H

B

C

O

N

J L

Node 1 Node 2 Node 3

Node 4 Node 6Node 5

E

M

I

C N

P

A

©2012 DataStax

CREATE TABLE users ( id uuid PRIMARY KEY, name text, state text, birth_date int);

CREATE INDEX ON users(state);

SELECT * FROM users WHERE state=‘Texas’ AND birth_date > 1950;

CQL: You got SQL in my NoSQL!

©2012 DataStax

Strictly “realtime” focused• No joins

• No subqueries

• No aggregation functions* or GROUP BY

• Strictly limited ORDER BY

©2012 DataStax

create column family sblocks with comparator = 'UUIDType' and default_validation_class = 'BytesType' and key_validation_class = 'UUIDType'

Example: CFS sblocks

©2012 DataStax

sblocks in context

©2012 DataStax

CREATE TABLE sblocks (    block_id uuid,    subblock_id uuid,    data blob,    PRIMARY KEY (block_id, subblock_id));

sblocks in CQL3

block_id subblock_id data

Block1 subblock A data ABlock1 subblock B data B

... ... ...

Block2 subblock C data CBlock2 subblock D data D

... ... ...

Block3 subblock E data EBlock3 subblock F data F

... ... ...

©2012 DataStax

CREATE TABLE users ( id uuid PRIMARY KEY, name text, state text, birth_date int);

Collections

XCREATE TABLE users_addresses ( user_id uuid REFERENCES users, email text);

SELECT *FROM users NATURAL JOIN users_addresses;

©2012 DataStax

CREATE TABLE users ( id uuid PRIMARY KEY, name text, state text, birth_date int, email_addresses set<text>);

Collections

UPDATE usersSET email_addresses = email_addresses + {‘jbellis@gmail.com’, ‘jbellis@datastax.com’};