State of Cassandra, 2012 - NoSQL | Apache Cassandra · State of Cassandra, 2012 Jonathan Ellis...

State of Cassandra, 2012Jonathan Ellis Project Chair, Apache Cassandra CTO, DataStax@spyced

Some Cassandra users, early 2011

Some Casandra users, mid 2012

Application/Use Case• Social Signals: like/want/own features for

eBay product and item pages• Hunch taste graph for eBay users and items• Many time series use cases

Why Cassandra? • Multi-datacenter• Scalable• Write performance• Distributed counters• Hadoop support

Time series data

Multi-datacenter support

Distributed counters

Hadoop support

Disney

Application/Use Case• Meet the data management needs of user

facing applications across The Walt Disney Company with a single platform

Why Cassandra? • DataStax Enterprise can tackle real-time

and search functions in the same cluster• Scalability• 24x7 uptime

Multitenancy

Enterprise search

SimpleReach

Application/Use Case• SimpleReach tracks social actions for

content creators, from Twitter and Facebook to Pinterest and Reddit, to deliver detailed insights and clear metrics around social behavior.

Why Cassandra? • Very high velocity data ingest rate and

large data volumes• Workload separation between realtime and

batch applications

SourceNinja

Application/Use Case• SourceNinja notifies you to performance,

security, and bug fixes for the software you depend on

Why Cassandra? • Previous database system could not

handle load; HBase has too many points of failure and was too slow

• Fast real time capabilities, batch analytics on that data, and enterprise search

Realtime + search + analytics = DataStax Enterprise

Netflix

Application/Use Case• General purpose backend for large scale

highly available cloud based web services supporting Netflix Streaming

Why Cassandra? • Highly available, highly robust and no

schema change downtime• Highly scalable, optimized for SSD• Much lower cost than previous Oracle and

SimpleDB implementations• Flexible data model• Ability to directly influence/implement

OSS feature set• Supports local and wide area distributed

operations, spanning US and Europe

Optimized for SSD

Open source

• Massively scalable

• High performance

• Reliable/Available

Use case patterns

Cassandra 0.6

Cassandra 1.0

reads/s writes/s

Recent Cassandra history• 0.7 (Jan 2011)• CREATE COLUMN FAMILY

• TTL

• Secondary (column) indexes

• 0.8 (Jun 2011)• Counters

• Automatic memtable tuning

• 1.0 (Oct 2011)• Compression

• Leveled compaction

Present• 1.1 (Apr 2012)• Self-tuning row + key caches

• Support for mixed SSD + HDD nodes

• Row-level isolation

Self-tuning Row Cache

Client

SSTables

Client Row Cache

WithoutCache

WithCache

Mixed SSD/HDD Support

Client

Cassandra Node

SSDHDD

Cassandra Instance

user_sessions

user_activity

user_sessionsuser_activity

Row Level Isolation

FooFoo

Passwd

BarBar

FooFoo

Passwd

BarFooUPDATE Users

SET login='bar'AND password='bar'WHERE key='e29b-41d4'

SELECT login, passwordFROM UsersWHERE key='e29b-41d4'

Bar, Foo Bar, Bar

Cassandra 1.0 Cassandra 1.1

Overloading “consistency”• ACID consistency = referential integrity

• Distributed system consistency• {consistency, availability, partition tolerance}

Future• 1.2 (Oct 2012?)• Concurrent schema changes

• JBOD support

• Virtual nodes

• CQL3

• Collections

Concurrent Schema Changes

CassandraCluster

Client

CREATE TABLE X;...

DROP TABLE X;

Client

CREATE TABLE Y;...

DROP TABLE Y;

JBOD support

HDD2HDD1

Cassandra Instance

HDD3 HDD4

JBOD support

HDD2HDD1

Cassandra Instance

HDD3 HDD4X

Virtual nodes

Ring without vnodes

Ring with vnodes

Node Rebuild without vnodes

Ring without vnodes

Node 1 Node 2 Node 3

Node 4 Node 6Node 5

Node Rebuild with vnodes

Ring with VNodes

Node 1 Node 2 Node 3

Node 4 Node 6Node 5

CREATE TABLE users ( id uuid PRIMARY KEY, name text, state text, birth_date int);

CREATE INDEX ON users(state);

SELECT * FROM users WHERE state=‘Texas’ AND birth_date > 1950;

CQL: You got SQL in my NoSQL!

Strictly “realtime” focused• No joins

• No subqueries

• No aggregation functions* or GROUP BY

• Strictly limited ORDER BY

create column family sblocks with comparator = 'UUIDType' and default_validation_class = 'BytesType' and key_validation_class = 'UUIDType'

Example: CFS sblocks

sblocks in context

CREATE TABLE sblocks ( block_id uuid, subblock_id uuid, data blob, PRIMARY KEY (block_id, subblock_id));

sblocks in CQL3

block_id subblock_id data

Block1 subblock A data ABlock1 subblock B data B

... ... ...

Block2 subblock C data CBlock2 subblock D data D

... ... ...

Block3 subblock E data EBlock3 subblock F data F

... ... ...

CREATE TABLE users ( id uuid PRIMARY KEY, name text, state text, birth_date int);

Collections

XCREATE TABLE users_addresses ( user_id uuid REFERENCES users, email text);

SELECT *FROM users NATURAL JOIN users_addresses;

CREATE TABLE users ( id uuid PRIMARY KEY, name text, state text, birth_date int, email_addresses set<text>);

Collections

UPDATE usersSET email_addresses = email_addresses + {‘jbellis@gmail.com’, ‘jbellis@datastax.com’};

State of Cassandra, 2012 - NoSQL | Apache Cassandra · State of Cassandra, 2012 Jonathan Ellis...

Documents

Transcript of State of Cassandra, 2012 - NoSQL | Apache Cassandra · State of Cassandra, 2012 Jonathan Ellis...

Introduction to Multi-Data Center Operations with Apache Cassandra and DataStax Enterprise

Simplifying Data Management DataStax and Robin … · DataStax Enterprise is the best distribution of Apache Cassandra™ and also includes ... Simplifying Data Management ... bare

Cassandra Day Chicago 2015: The Synergy Between Apache Cassandra and DataStax Enterprise

DataStax Enterprise Reference Architecture · DataStax Enterprise Reference Architecture 7.8.15 3 Abstract This white paper outlines reference architectures for deploying Apache Cassandra™

Cassandra Day Chicago 2015: DataStax Enterprise & Apache Cassandra Hardware Best Practices

DataStax-Welcome to Cassandra - odbms.org€¦ · What Is Apache Cassandra?.....4! Why Cassandra ... Software Supplied by DataStax for Cassandra 1.0.....18! OpsCenter Community ...

DataStax | DataStax Tools for Developers (Alex Popescu) | Cassandra Summit 2016

Improving Tombstone Compactions in Apache Cassandra (James Witschey & Philip Thompson, DataStax) | C* Summit 2016

Cassandra Day Atlanta 2015: Introduction to Apache Cassandra & DataStax Enterprise

DataStax | Data Science with DataStax Enterprise (Brian Hess) | Cassandra Summit 2016

DataStax: Extreme Cassandra Optimization: The Sequel

Datastax - Reportingand Analyticson Apache Cassandra

DataStaxODBCdriverforApache ......[DataStax ODBC driver for Apache Cassandra and DataStax Enterprise with CQL connector 32-bit] Description=DataStax ODBC driver for Apache Cassandra

Apache Cassandra in Action · PDF fileApache Cassandra in Action! Jonathan Ellis Project Chair, Apache Cassandra CTO, DataStax @spyced ©2012 DataStax ... La Grange ZZ Top Tres Hombres

DataStaxODBCdriverforApache ......[ODBC Drivers] DataStax ODBC driver for Apache Cassandra and DataStax Enterprise with CQL connector 32-bit=Installed DataStax ODBC driver for Apache

Cassandra Day London 2015: Securing Cassandra and DataStax Enterprise

DataStax Enterprise & Apache Cassandra – Essentials for Financial Services – 20151006

DataStax | Graph Computing with Apache TinkerPop (Marko Rodriguez) | Cassandra Summit 2016

Introduction to Multi-Data Center Operations with …€¦ · 1 Introduction to Multi-Data Center Operations with Apache Cassandra and DataStax Enterprise White Paper BY DATASTAX

DataStax | Graph Data Modeling in DataStax Enterprise (Artem Chebotko) | Cassandra Summit 2016