Massively Scalable NoSQL with Apache Cassandra
-
Upload
jbellis -
Category
Technology
-
view
132 -
download
2
description
Transcript of Massively Scalable NoSQL with Apache Cassandra
Massively scalable NoSQL with Apache Cassandra!Jonathan Ellis Project Chair, Apache Cassandra CTO, DataStax@spyced
©2012 DataStax
Big data
Analytics(Hadoop)
Realtime(“NoSQL”)
?
©2012 DataStax
Some Casandra users
©2012 DataStax
eBay
Application/Use Case• Social Signals: like/want/own features for
eBay product and item pages• Hunch taste graph for eBay users and items• Many time series use cases
Why Cassandra? • Multi-datacenter• Scalable• Write performance• Distributed counters• Hadoop support
ACE
©2012 DataStax
Time series data
©2012 DataStax
Multi-datacenter support
©2012 DataStax
Distributed counters
©2012 DataStax
Hadoop support
©2012 DataStax
Disney
Application/Use Case• Meet the data management needs of user
facing applications across The Walt Disney Company with a single platform
Why Cassandra? • DataStax Enterprise can tackle real-time
and search functions in the same cluster• Scalability• 24x7 uptime
NDI
©2012 DataStax
Multitenancy
©2012 DataStax
Multitenancy
©2012 DataStax
Enterprise search
©2012 DataStax
SimpleReachApplication/Use Case• SimpleReach tracks social actions for
content creators, from Twitter and Facebook to Pinterest and Reddit, to deliver detailed insights and clear metrics around social behavior.
Why Cassandra? • Very high velocity data ingest rate and
large data volumes• Workload separation between realtime
and batch applications
NDE
©2012 DataStax
SourceNinjaApplication/Use Case• SourceNinja notifies you to performance,
security, and bug fixes for the software you depend on
Why Cassandra? • Previous database system could not
handle load; HBase has too many points of failure and was too slow
• Fast real time capabilities, batch analytics on that data, and enterprise search
RDE
©2012 DataStax
NetflixApplication/Use Case• General purpose backend for large scale
highly available cloud based web services supporting Netflix Streaming
Why Cassandra? • Highly available, highly robust and no
schema change downtime• Highly scalable, optimized for SSD• Much lower cost than previous Oracle and
SimpleDB implementations• Flexible data model• Ability to directly influence/implement
OSS feature set• Supports local and wide area distributed
operations, spanning US and Europe
RCE
©2012 DataStax
Optimized for SSD
©2012 DataStax
Open source
©2012 DataStax
• Massively scalable
• High performance
• Reliable/Available
Use case patterns
©2012 DataStax
©2012 DataStax
0
5000
10000
15000
20000
25000
30000
35000
Cassandra 0.6
Cassandra 1.0
reads/s writes/s
©2012 DataStax
©2012 DataStax
Classic partitioning with SPOFpartition 1 partition 2 partition 3 partition 4
router
client
©2012 DataStax
Availability• “High availability implies that a single fault will not bring
down your system. Not ‘we’ll recover quickly.’” -- Ben Coverston: DataStax
• “The biggest problem with failover is that you're almost never using it until it really hurts. It's like backups that you never test.” -- Rick Branson: Instagram
©2012 DataStax
Fully distributed, no SPOFclient
p1
p1
p1p3
p6
©2012 DataStax
Partitioning
jim
carol
johnny
suzy
age: 36 car: camaro gender: M
age: 37 car: subaru gender: F
age:12 gender: M
age:10 gender: F
©2012 DataStax
Primary key determines placement*
Partitioning
jim
carol
johnny
suzy
age: 36 car: camaro gender: M
age: 37 car: subaru gender: F
age:12 gender: M
age:10 gender: F
©2012 DataStax
jim
carol
johnny
suzy
PK
5e02739678...
a9a0198010...
f4eb27cea7...
78b421309e...
MD5 Hash
MD5* hash operation yields a 128-bit number
for keysof any size.
©2012 DataStax
Node A
Node D Node C
Node B
The “token ring”
©2012 DataStax
jim 5e02739678...
carol a9a0198010...
johnny f4eb27cea7...
suzy 78b421309e...
Start EndA 0xc000000000..1 0x0000000000..0
B 0x0000000000..1 0x4000000000..0
C 0x4000000000..1 0x8000000000..0
D 0x8000000000..1 0xc000000000..0
©2012 DataStax
jim 5e02739678...
carol a9a0198010...
johnny f4eb27cea7...
suzy 78b421309e...
Start EndA 0xc000000000..1 0x0000000000..0
B 0x0000000000..1 0x4000000000..0
C 0x4000000000..1 0x8000000000..0
D 0x8000000000..1 0xc000000000..0
©2012 DataStax
jim 5e02739678...
carol a9a0198010...
johnny f4eb27cea7...
suzy 78b421309e...
Start EndA 0xc000000000..1 0x0000000000..0
B 0x0000000000..1 0x4000000000..0
C 0x4000000000..1 0x8000000000..0
D 0x8000000000..1 0xc000000000..0
©2012 DataStax
jim 5e02739678...
carol a9a0198010...
johnny f4eb27cea7...
suzy 78b421309e...
Start EndA 0xc000000000..1 0x0000000000..0
B 0x0000000000..1 0x4000000000..0
C 0x4000000000..1 0x8000000000..0
D 0x8000000000..1 0xc000000000..0
©2012 DataStax
jim 5e02739678...
carol a9a0198010...
johnny f4eb27cea7...
suzy 78b421309e...
Start EndA 0xc000000000..1 0x0000000000..0
B 0x0000000000..1 0x4000000000..0
C 0x4000000000..1 0x8000000000..0
D 0x8000000000..1 0xc000000000..0
©2012 DataStax
Node A
Node D Node C
Node B
carol a9a0198010...
Replication
©2012 DataStax
Node A
Node D Node C
Node B
carol a9a0198010...
©2012 DataStax
Node A
Node D Node C
Node B
carol a9a0198010...
©2012 DataStax
Highlights• Adding capacity is application-transparent and requires
no downtime
• No SPOF, not even temporarily• No “primary” replica
• Con!gurable synchronous/asynchronous
• Tolerates node failure; never have to restart replication “from scratch”
• “Smart” replication avoids correlated failures
©2012 DataStax
CREATE TABLE users ( id uuid PRIMARY KEY, name text, state text, birth_date int);
CREATE INDEX ON users(state);
SELECT * FROM users WHERE state=‘Texas’ AND birth_date > 1950;
CQL: You got SQL in my NoSQL!
©2012 DataStax
Strictly “realtime” focused• No joins
• No subqueries
• No aggregation functions* or GROUP BY
• ORDER BY?
©2012 DataStax
Clustered data in in CFS
©2012 DataStax
Clustered data in in CFS
©2012 DataStax
CREATE TABLE sblocks ( block_id uuid, subblock_id uuid, data blob, PRIMARY KEY (block_id, subblock_id));
Clustering in CQL3
block_id subblock_id data
Block1 subblock A data ABlock1 subblock B data B
... ... ...
Block2 subblock C data CBlock2 subblock D data D
... ... ...
Block3 subblock E data EBlock3 subblock F data F
... ... ...
©2012 DataStax
CREATE TABLE users ( id uuid PRIMARY KEY, name text, state text, birth_date int);
CREATE TABLE users_addresses ( user_id uuid REFERENCES users, email text);
SELECT *FROM users NATURAL JOIN users_addresses;
Collections
©2012 DataStax
CREATE TABLE users ( id uuid PRIMARY KEY, name text, state text, birth_date int);
CREATE TABLE users_addresses ( user_id uuid REFERENCES users, email text);
SELECT *FROM users NATURAL JOIN users_addresses;
Collections
X
©2012 DataStax
CREATE TABLE users ( id uuid PRIMARY KEY, name text, state text, birth_date int, email_addresses set<text>);
UPDATE usersSET email_addresses = email_addresses + {‘[email protected]’, ‘[email protected]’};
Collections
©2012 DataStax
Big data
Analytics(Hadoop)
Realtime(“NoSQL”)
?
©2012 DataStax
The evolution of Analytics
Analytics + Realtime
©2012 DataStax
The evolution of Analytics
Analytics Realtime
replication
©2012 DataStax
The evolution of Analytics
ETL
©2012 DataStax
Big data
Analytics(Hadoop)
Realtime(Cassandra)
DatastaxEnterprise
©2012 DataStax
©2012 DataStax
Better Hadoop than Hadoop• “Vanilla” Hadoop
• 8+ services to setup, monitor, backup, and recover(NameNode, SecondaryNameNode, DataNode, JobTracker, TaskTracker, Zookeeper, Region Server,...)
• Single points of failure
• Can't separate online and offline processing
• DataStax Enterprise• Single, simpli!ed component
• Self-organizes based on workload
• Peer to peer
• JobTracker failover
©2012 DataStax
SELECT title FROM solr WHERE solr_query='title:natio*';
title-------------------------------------------------------------------------- Bolivia national football team 2002 List of French born footballers who have played for other national teams Lithuania national basketball team at Eurobasket 2009 Bolivia national football team 2000 Kenya national under-20 football team Bolivia national football team 1999 Israel men's national inline hockey team Bolivia national football team 2001
Enterprise search with Solr
©2012 DataStax
Managing & Monitoring Big Data
©2012 DataStax
Questions?• http://www.datastax.com/docs
• http://www.datastax.com/dev/blog/whats-new-in-cassandra-1-1
• http://www.datastax.com/dev/blog/schema-in-cassandra-1-1
• http://www.datastax.com/products/enterprise