Cassandra 1.0 and the future of big data (Cassandra Tokyo 2011)
-
Upload
jbellis -
Category
Technology
-
view
2.399 -
download
1
description
Transcript of Cassandra 1.0 and the future of big data (Cassandra Tokyo 2011)
![Page 1: Cassandra 1.0 and the future of big data (Cassandra Tokyo 2011)](https://reader034.fdocuments.us/reader034/viewer/2022052310/54b7b9434a7959181f8b46b8/html5/thumbnails/1.jpg)
Cassandra 1.0and the future of big data
Jonathan Ellis
Tuesday, October 4, 2011
![Page 2: Cassandra 1.0 and the future of big data (Cassandra Tokyo 2011)](https://reader034.fdocuments.us/reader034/viewer/2022052310/54b7b9434a7959181f8b46b8/html5/thumbnails/2.jpg)
About me
✤ Project chair, Apache Cassandra✤ Active since Dec 2008✤ First non-Facebook committer✤ wrote ~30% of committed patches, reviewed ~40% of the rest
✤ Distributed systems background✤ At Mozy, built a multi-petabyte, scalable storage system based on
Reed-Solomon encoding
✤ Founder and CTO, DataStax
Tuesday, October 4, 2011
![Page 3: Cassandra 1.0 and the future of big data (Cassandra Tokyo 2011)](https://reader034.fdocuments.us/reader034/viewer/2022052310/54b7b9434a7959181f8b46b8/html5/thumbnails/3.jpg)
About DataStax
✤ Founded in April 2010✤ Commercial leader in Apache Cassandra✤ 100+ customers✤ 30+ employees✤ Home to Apache Cassandra Chair & most committers✤ Headquartered in San Francisco Bay area, California✤ Secured $11M in Series B funding in Sep 2011
Tuesday, October 4, 2011
![Page 4: Cassandra 1.0 and the future of big data (Cassandra Tokyo 2011)](https://reader034.fdocuments.us/reader034/viewer/2022052310/54b7b9434a7959181f8b46b8/html5/thumbnails/4.jpg)
Job Trends (indeed.com)
Tuesday, October 4, 2011
![Page 5: Cassandra 1.0 and the future of big data (Cassandra Tokyo 2011)](https://reader034.fdocuments.us/reader034/viewer/2022052310/54b7b9434a7959181f8b46b8/html5/thumbnails/5.jpg)
“Big Data” trend
Tuesday, October 4, 2011
![Page 6: Cassandra 1.0 and the future of big data (Cassandra Tokyo 2011)](https://reader034.fdocuments.us/reader034/viewer/2022052310/54b7b9434a7959181f8b46b8/html5/thumbnails/6.jpg)
Big data
Analytics(Hadoop)
Realtime(“NoSQL”)?
Tuesday, October 4, 2011
![Page 7: Cassandra 1.0 and the future of big data (Cassandra Tokyo 2011)](https://reader034.fdocuments.us/reader034/viewer/2022052310/54b7b9434a7959181f8b46b8/html5/thumbnails/7.jpg)
✤ Financial✤ Social Media✤ Advertising✤ Entertainment✤ Energy✤ E-tail✤ Health care✤ Government
Some Cassandra users
Tuesday, October 4, 2011
![Page 8: Cassandra 1.0 and the future of big data (Cassandra Tokyo 2011)](https://reader034.fdocuments.us/reader034/viewer/2022052310/54b7b9434a7959181f8b46b8/html5/thumbnails/8.jpg)
Common use cases
✤ Time series data✤ Messaging✤ Ad tracking✤ Data mining✤ User activity streams✤ User sessions✤ Anything requiring: Scalable + performant + highly
available
Tuesday, October 4, 2011
![Page 9: Cassandra 1.0 and the future of big data (Cassandra Tokyo 2011)](https://reader034.fdocuments.us/reader034/viewer/2022052310/54b7b9434a7959181f8b46b8/html5/thumbnails/9.jpg)
Why people choose Cassandra
✤ Multi-master, multi-DC✤ Linearly scalable✤ Larger-than-memory datasets✤ Best-in-class performance (not just writes!)✤ Fully durable✤ Integrated caching✤ Tuneable consistency
Tuesday, October 4, 2011
![Page 10: Cassandra 1.0 and the future of big data (Cassandra Tokyo 2011)](https://reader034.fdocuments.us/reader034/viewer/2022052310/54b7b9434a7959181f8b46b8/html5/thumbnails/10.jpg)
0.7
✤ CREATE COLUMN FAMILY✤ Expiring columns (TTL)✤ Secondary (column) indexes✤ Efficient streaming✤ Efficient cross-datacenter writes
Tuesday, October 4, 2011
![Page 11: Cassandra 1.0 and the future of big data (Cassandra Tokyo 2011)](https://reader034.fdocuments.us/reader034/viewer/2022052310/54b7b9434a7959181f8b46b8/html5/thumbnails/11.jpg)
0.8
✤ CQL✤ Counters✤ Automatic memtable tuning✤ New bulk load interface
Tuesday, October 4, 2011
![Page 12: Cassandra 1.0 and the future of big data (Cassandra Tokyo 2011)](https://reader034.fdocuments.us/reader034/viewer/2022052310/54b7b9434a7959181f8b46b8/html5/thumbnails/12.jpg)
1.0
✤ Compression✤ Read performance✤ LeveledCompactionStrategy✤ CQL 2.0
Tuesday, October 4, 2011
![Page 13: Cassandra 1.0 and the future of big data (Cassandra Tokyo 2011)](https://reader034.fdocuments.us/reader034/viewer/2022052310/54b7b9434a7959181f8b46b8/html5/thumbnails/13.jpg)
Compression
✤ Rows-per-block or blocks-per-row
Tuesday, October 4, 2011
![Page 14: Cassandra 1.0 and the future of big data (Cassandra Tokyo 2011)](https://reader034.fdocuments.us/reader034/viewer/2022052310/54b7b9434a7959181f8b46b8/html5/thumbnails/14.jpg)
Classic size-tiered compaction
Tuesday, October 4, 2011
![Page 15: Cassandra 1.0 and the future of big data (Cassandra Tokyo 2011)](https://reader034.fdocuments.us/reader034/viewer/2022052310/54b7b9434a7959181f8b46b8/html5/thumbnails/15.jpg)
Level-based Compaction
✤ SSTables are non-overlapping within a level✤ Bounds the number that can contain a given row
L2: 1000 MB
L1: 100 MB
L0: newly flushed
Tuesday, October 4, 2011
![Page 16: Cassandra 1.0 and the future of big data (Cassandra Tokyo 2011)](https://reader034.fdocuments.us/reader034/viewer/2022052310/54b7b9434a7959181f8b46b8/html5/thumbnails/16.jpg)
Read performance: maxtimestamp
✤ Sort sstables by maximum (client-provided) timestamp✤ Only merge sstables until we have the columns requested✤ Allows pre-merging highly fragmented rows without
waiting for compaction
Tuesday, October 4, 2011
![Page 17: Cassandra 1.0 and the future of big data (Cassandra Tokyo 2011)](https://reader034.fdocuments.us/reader034/viewer/2022052310/54b7b9434a7959181f8b46b8/html5/thumbnails/17.jpg)
Results
Tuesday, October 4, 2011
![Page 18: Cassandra 1.0 and the future of big data (Cassandra Tokyo 2011)](https://reader034.fdocuments.us/reader034/viewer/2022052310/54b7b9434a7959181f8b46b8/html5/thumbnails/18.jpg)
CQL
cqlsh> SELECT * FROM users WHERE state='UT' AND birth_date > 1970;
KEY | birth_date | full_name | state | bsanderson | 1975 | Brandon Sanderson | UT |
Tuesday, October 4, 2011
![Page 19: Cassandra 1.0 and the future of big data (Cassandra Tokyo 2011)](https://reader034.fdocuments.us/reader034/viewer/2022052310/54b7b9434a7959181f8b46b8/html5/thumbnails/19.jpg)
CQL 2.0
✤ ALTER✤ Counter support✤ TTL support✤ SELECT count(*)
Tuesday, October 4, 2011
![Page 20: Cassandra 1.0 and the future of big data (Cassandra Tokyo 2011)](https://reader034.fdocuments.us/reader034/viewer/2022052310/54b7b9434a7959181f8b46b8/html5/thumbnails/20.jpg)
Post-1.0 features
✤ Ease Of Use✤ CQL
✤ “Native” transport✤ Composite columns✤ Prepared statements
✤ Triggers✤ Entity groups✤ Smarter range queries
✤ Enables more-efficient analytics
Tuesday, October 4, 2011
![Page 21: Cassandra 1.0 and the future of big data (Cassandra Tokyo 2011)](https://reader034.fdocuments.us/reader034/viewer/2022052310/54b7b9434a7959181f8b46b8/html5/thumbnails/21.jpg)
The evolution of Analytics
Analytics + Realtime
Tuesday, October 4, 2011
![Page 22: Cassandra 1.0 and the future of big data (Cassandra Tokyo 2011)](https://reader034.fdocuments.us/reader034/viewer/2022052310/54b7b9434a7959181f8b46b8/html5/thumbnails/22.jpg)
The evolution of Analytics
Analytics Realtime
replication
Tuesday, October 4, 2011
![Page 23: Cassandra 1.0 and the future of big data (Cassandra Tokyo 2011)](https://reader034.fdocuments.us/reader034/viewer/2022052310/54b7b9434a7959181f8b46b8/html5/thumbnails/23.jpg)
The evolution of Analytics
ETL
Tuesday, October 4, 2011
![Page 24: Cassandra 1.0 and the future of big data (Cassandra Tokyo 2011)](https://reader034.fdocuments.us/reader034/viewer/2022052310/54b7b9434a7959181f8b46b8/html5/thumbnails/24.jpg)
Big data
Analytics(Hadoop)
Realtime(Cassandra)
DataStaxEnterprise
Tuesday, October 4, 2011
![Page 25: Cassandra 1.0 and the future of big data (Cassandra Tokyo 2011)](https://reader034.fdocuments.us/reader034/viewer/2022052310/54b7b9434a7959181f8b46b8/html5/thumbnails/25.jpg)
DataStax Enterprise re-unifiesrealtime and analytics
Tuesday, October 4, 2011
![Page 26: Cassandra 1.0 and the future of big data (Cassandra Tokyo 2011)](https://reader034.fdocuments.us/reader034/viewer/2022052310/54b7b9434a7959181f8b46b8/html5/thumbnails/26.jpg)
26
Tuesday, October 4, 2011
![Page 27: Cassandra 1.0 and the future of big data (Cassandra Tokyo 2011)](https://reader034.fdocuments.us/reader034/viewer/2022052310/54b7b9434a7959181f8b46b8/html5/thumbnails/27.jpg)
Data model: Realtime
GOOG LNKD P AMZN AAPLE80 20 40 100 20
Portfolio1
Portfolios
2011-01-01 2011-01-02 2011-01-03$79.85 $75.23 $82.11
GOOG
StockHist
last$95.52
$186.10
$112.98
GOOG
LiveStocks
AAPLAMZN
Tuesday, October 4, 2011
![Page 28: Cassandra 1.0 and the future of big data (Cassandra Tokyo 2011)](https://reader034.fdocuments.us/reader034/viewer/2022052310/54b7b9434a7959181f8b46b8/html5/thumbnails/28.jpg)
Data model: Analytics
worst_date loss2011-07-23 -$34.812011-03-11 -$11432.242011-05-21 -$1476.93
Portfolio1
HistLoss
Portfolio2Portfolio3
Tuesday, October 4, 2011
![Page 29: Cassandra 1.0 and the future of big data (Cassandra Tokyo 2011)](https://reader034.fdocuments.us/reader034/viewer/2022052310/54b7b9434a7959181f8b46b8/html5/thumbnails/29.jpg)
Data model: Analytics
ticker rdate returnGOOG 2011-07-25 $8.23GOOG 2011-07-24 $6.14GOOG 2011-07-23 $7.78AAPL 2011-07-25 $15.32AAPL 2011-07-24 $12.68
10dayreturns
INSERT OVERWRITE TABLE 10dayreturnsSELECT a.row_key ticker, b.column_name rdate, b.value - a.valueFROM StockHist a JOIN StockHist b ON (a.row_key = b.row_key AND date_add(a.column_name,10) = b.column_name);
Tuesday, October 4, 2011
![Page 30: Cassandra 1.0 and the future of big data (Cassandra Tokyo 2011)](https://reader034.fdocuments.us/reader034/viewer/2022052310/54b7b9434a7959181f8b46b8/html5/thumbnails/30.jpg)
2011-01-01 2011-01-02 2011-01-03$79.85 $75.23 $82.11
GOOG
row_key column_name valueGOOG 2011-01-01 $8.23GOOG 2011-01-02 $6.14GOOG 2011-001-03 $7.78
Data model: Analytics
Tuesday, October 4, 2011
![Page 31: Cassandra 1.0 and the future of big data (Cassandra Tokyo 2011)](https://reader034.fdocuments.us/reader034/viewer/2022052310/54b7b9434a7959181f8b46b8/html5/thumbnails/31.jpg)
Data model: Analytics
portfolio rdate preturnPortfolio1 2011-07-25 $118.21Portfolio1 2011-07-24 $60.78Portfolio1 2011-07-23 -$34.81Portfolio2 2011-07-25 $2143.92Portfolio3 2011-07-24 -$10.19
portfolio_returns
INSERT OVERWRITE TABLE portfolio_returnsSELECT row_key portfolio, rdate, SUM(b.return)FROM portfolios a JOIN 10dayreturns b ON (a.column_name = b.ticker)GROUP BY row_key, rdate;
Tuesday, October 4, 2011
![Page 32: Cassandra 1.0 and the future of big data (Cassandra Tokyo 2011)](https://reader034.fdocuments.us/reader034/viewer/2022052310/54b7b9434a7959181f8b46b8/html5/thumbnails/32.jpg)
Data model: Analytics
INSERT OVERWRITE TABLE HistLossSELECT a.portfolio, rdate, minpFROM ( SELECT portfolio, min(preturn) as minp FROM portfolio_returns GROUP BY portfolio) a JOIN portfolio_returns b ON (a.portfolio = b.portfolio and a.minp = b.preturn);
worst_date loss2011-07-23 -$34.812011-03-11 -$11432.242011-05-21 -$1476.93
Portfolio1
HistLoss
Portfolio2Portfolio3
Tuesday, October 4, 2011
![Page 33: Cassandra 1.0 and the future of big data (Cassandra Tokyo 2011)](https://reader034.fdocuments.us/reader034/viewer/2022052310/54b7b9434a7959181f8b46b8/html5/thumbnails/33.jpg)
Portfolio Demo dataflow
Portfolios
Historical Prices
Intermediate Results
Largest loss
Portfolios
Live Prices for today
Largest loss
Tuesday, October 4, 2011
![Page 34: Cassandra 1.0 and the future of big data (Cassandra Tokyo 2011)](https://reader034.fdocuments.us/reader034/viewer/2022052310/54b7b9434a7959181f8b46b8/html5/thumbnails/34.jpg)
Operations
✤ “Vanilla” Hadoop✤ 8+ services to setup, monitor, backup, and recover
(NameNode, SecondaryNameNode, DataNode, JobTracker, TaskTracker, Zookeeper, Region Server,...)
✤ Single points of failure✤ Can't separate online and offline processing
✤ DataStax Enterprise✤ Single, simplified component✤ Self-organizes based on workload✤ Peer to peer✤ JobTracker failover✤ No additional cassandra config
Tuesday, October 4, 2011
![Page 35: Cassandra 1.0 and the future of big data (Cassandra Tokyo 2011)](https://reader034.fdocuments.us/reader034/viewer/2022052310/54b7b9434a7959181f8b46b8/html5/thumbnails/35.jpg)
OpsCenter
Tuesday, October 4, 2011
![Page 37: Cassandra 1.0 and the future of big data (Cassandra Tokyo 2011)](https://reader034.fdocuments.us/reader034/viewer/2022052310/54b7b9434a7959181f8b46b8/html5/thumbnails/37.jpg)
37
Tuesday, October 4, 2011