Intro to Cassandra
-
Upload
tyler-hobbs -
Category
Technology
-
view
2.370 -
download
0
description
Transcript of Intro to Cassandra
![Page 1: Intro to Cassandra](https://reader033.fdocuments.us/reader033/viewer/2022052903/557a35ced8b42a32248b4909/html5/thumbnails/1.jpg)
CassandraIntro to
Tyler Hobbs
![Page 2: Intro to Cassandra](https://reader033.fdocuments.us/reader033/viewer/2022052903/557a35ced8b42a32248b4909/html5/thumbnails/2.jpg)
Dynamo(clustering)
History
BigTable(data model)
Cassandra
![Page 3: Intro to Cassandra](https://reader033.fdocuments.us/reader033/viewer/2022052903/557a35ced8b42a32248b4909/html5/thumbnails/3.jpg)
Users
![Page 4: Intro to Cassandra](https://reader033.fdocuments.us/reader033/viewer/2022052903/557a35ced8b42a32248b4909/html5/thumbnails/4.jpg)
Every node plays the same role– No masters, slaves, or special nodes
– No single point of failure
Clustering
![Page 5: Intro to Cassandra](https://reader033.fdocuments.us/reader033/viewer/2022052903/557a35ced8b42a32248b4909/html5/thumbnails/5.jpg)
Consistent Hashing
0
10
20
30
40
50
![Page 6: Intro to Cassandra](https://reader033.fdocuments.us/reader033/viewer/2022052903/557a35ced8b42a32248b4909/html5/thumbnails/6.jpg)
0
10
20
30
40
50
Key: “www.google.com”
Consistent Hashing
![Page 7: Intro to Cassandra](https://reader033.fdocuments.us/reader033/viewer/2022052903/557a35ced8b42a32248b4909/html5/thumbnails/7.jpg)
0
10
20
30
40
50
Key: “www.google.com”
14
md5(“www.google.com”)
Consistent Hashing
![Page 8: Intro to Cassandra](https://reader033.fdocuments.us/reader033/viewer/2022052903/557a35ced8b42a32248b4909/html5/thumbnails/8.jpg)
0
10
20
30
40
50
14
Key: “www.google.com”
md5(“www.google.com”)
Consistent Hashing
![Page 9: Intro to Cassandra](https://reader033.fdocuments.us/reader033/viewer/2022052903/557a35ced8b42a32248b4909/html5/thumbnails/9.jpg)
0
10
20
30
40
50
14
Key: “www.google.com”
md5(“www.google.com”)
Consistent Hashing
![Page 10: Intro to Cassandra](https://reader033.fdocuments.us/reader033/viewer/2022052903/557a35ced8b42a32248b4909/html5/thumbnails/10.jpg)
0
10
20
30
40
50
14
Key: “www.google.com”
md5(“www.google.com”)
Replication Factor = 3
Consistent Hashing
![Page 11: Intro to Cassandra](https://reader033.fdocuments.us/reader033/viewer/2022052903/557a35ced8b42a32248b4909/html5/thumbnails/11.jpg)
Client can talk to any node
Clustering
![Page 12: Intro to Cassandra](https://reader033.fdocuments.us/reader033/viewer/2022052903/557a35ced8b42a32248b4909/html5/thumbnails/12.jpg)
Scaling
50
0
10
20
30
The node at50 owns the red portion
RF = 2
![Page 13: Intro to Cassandra](https://reader033.fdocuments.us/reader033/viewer/2022052903/557a35ced8b42a32248b4909/html5/thumbnails/13.jpg)
Scaling
50
0
10
20
30
40Add a new node at 40
RF = 2
![Page 14: Intro to Cassandra](https://reader033.fdocuments.us/reader033/viewer/2022052903/557a35ced8b42a32248b4909/html5/thumbnails/14.jpg)
Scaling
50
0
10
20
30
40Add a new node at 40
RF = 2
![Page 15: Intro to Cassandra](https://reader033.fdocuments.us/reader033/viewer/2022052903/557a35ced8b42a32248b4909/html5/thumbnails/15.jpg)
Node Failures
50
0
10
20
30
RF = 2
40
Replicas
![Page 16: Intro to Cassandra](https://reader033.fdocuments.us/reader033/viewer/2022052903/557a35ced8b42a32248b4909/html5/thumbnails/16.jpg)
Node Failures
50
0
10
20
30
RF = 2
40
Replicas
![Page 17: Intro to Cassandra](https://reader033.fdocuments.us/reader033/viewer/2022052903/557a35ced8b42a32248b4909/html5/thumbnails/17.jpg)
Node Failures
50
0
10
20
30
RF = 2
40
![Page 18: Intro to Cassandra](https://reader033.fdocuments.us/reader033/viewer/2022052903/557a35ced8b42a32248b4909/html5/thumbnails/18.jpg)
Consistency, Availability Consistency
– Can I read stale data? Availability
– Can I write/read at all? Tunable Consistency
![Page 19: Intro to Cassandra](https://reader033.fdocuments.us/reader033/viewer/2022052903/557a35ced8b42a32248b4909/html5/thumbnails/19.jpg)
Consistency N = Total number of replicas R = Number of replicas read from
– (before the response is returned) W = Number of replicas written to
– (before the write is considered a success)
![Page 20: Intro to Cassandra](https://reader033.fdocuments.us/reader033/viewer/2022052903/557a35ced8b42a32248b4909/html5/thumbnails/20.jpg)
Consistency N = Total number of replicas R = Number of replicas read from
– (before the response is returned) W = Number of replicas written to
– (before the write is considered a success)
W + R > N gives strong consistency
![Page 21: Intro to Cassandra](https://reader033.fdocuments.us/reader033/viewer/2022052903/557a35ced8b42a32248b4909/html5/thumbnails/21.jpg)
Consistency
W + R > N gives strong consistency
N = 3W = 2R = 2
2 + 2 > 3 ==> strongly consistent
![Page 22: Intro to Cassandra](https://reader033.fdocuments.us/reader033/viewer/2022052903/557a35ced8b42a32248b4909/html5/thumbnails/22.jpg)
Consistency
W + R > N gives strong consistency
N = 3W = 2R = 2
2 + 2 > 3 ==> strongly consistent
Only 2 of the 3 replicas must be available.
![Page 23: Intro to Cassandra](https://reader033.fdocuments.us/reader033/viewer/2022052903/557a35ced8b42a32248b4909/html5/thumbnails/23.jpg)
Consistency Tunable Consistency
– Specify N (Replication Factor) per data set– Specify R, W per operation
![Page 24: Intro to Cassandra](https://reader033.fdocuments.us/reader033/viewer/2022052903/557a35ced8b42a32248b4909/html5/thumbnails/24.jpg)
Consistency Tunable Consistency
– Specify N (Replication Factor) per data set– Specify R, W per operation– Quorum: N/2 + 1
• R = W = Quorum• Strong consistency• Tolerate the loss of N – Quorum replicas
– R, W can also be 1 or N
![Page 25: Intro to Cassandra](https://reader033.fdocuments.us/reader033/viewer/2022052903/557a35ced8b42a32248b4909/html5/thumbnails/25.jpg)
Availability Can tolerate the loss of:
– N – R replicas for reads– N – W replicas for writes
![Page 26: Intro to Cassandra](https://reader033.fdocuments.us/reader033/viewer/2022052903/557a35ced8b42a32248b4909/html5/thumbnails/26.jpg)
CAP Theorem
Availability
Consistency
During node or network failure:
100%
100%
Possible
Not Possible
![Page 27: Intro to Cassandra](https://reader033.fdocuments.us/reader033/viewer/2022052903/557a35ced8b42a32248b4909/html5/thumbnails/27.jpg)
CAP Theorem
Availability
Consistency
During node or network failure:
100%
100%
Cassandra
Not Possible
Possible
![Page 28: Intro to Cassandra](https://reader033.fdocuments.us/reader033/viewer/2022052903/557a35ced8b42a32248b4909/html5/thumbnails/28.jpg)
No single point of failure Replication that works Scales linearly
– 2x nodes = 2x performance• For both writes and reads
– Up to 100's of nodes Operationally simple Multi-Datacenter Replication
Clustering
![Page 29: Intro to Cassandra](https://reader033.fdocuments.us/reader033/viewer/2022052903/557a35ced8b42a32248b4909/html5/thumbnails/29.jpg)
Comes from Google BigTable Goals
– Minimize disk seeks– High throughput– Low latency– Durable
Data Model
![Page 30: Intro to Cassandra](https://reader033.fdocuments.us/reader033/viewer/2022052903/557a35ced8b42a32248b4909/html5/thumbnails/30.jpg)
Keyspace– A collection of Column Families– Controls replication settings
Column Family– Kinda resembles a table
Data Model
![Page 31: Intro to Cassandra](https://reader033.fdocuments.us/reader033/viewer/2022052903/557a35ced8b42a32248b4909/html5/thumbnails/31.jpg)
Static– Object data– Similar to a table in a relational database
Dynamic– Pre-calculated query results– Materialized views
Column Families
![Page 32: Intro to Cassandra](https://reader033.fdocuments.us/reader033/viewer/2022052903/557a35ced8b42a32248b4909/html5/thumbnails/32.jpg)
Static Column Families
zznate
driftx
thobbs
jbellis
password: *
password: *
password: *
name: Nate
name: Brandon
name: Tyler
password: * name: Jonathan site: riptano.com
Users
![Page 33: Intro to Cassandra](https://reader033.fdocuments.us/reader033/viewer/2022052903/557a35ced8b42a32248b4909/html5/thumbnails/33.jpg)
Rows– Each row has a unique primary key– Sorted list of (name, value) tuples
• Like a sorted map or dictionary– The (name, value) tuple is called a “column”
Dynamic Column Families
![Page 34: Intro to Cassandra](https://reader033.fdocuments.us/reader033/viewer/2022052903/557a35ced8b42a32248b4909/html5/thumbnails/34.jpg)
Dynamic Column Families
zznate
driftx
thobbs
jbellis
driftx: thobbs:
driftx: thobbs:mdennis: zznate
Following
zznate:
pcmanus xedin:
![Page 35: Intro to Cassandra](https://reader033.fdocuments.us/reader033/viewer/2022052903/557a35ced8b42a32248b4909/html5/thumbnails/35.jpg)
Column Timestamps– Each column (tuple) has a timestamp– In the case of a collision, the latest timestamp wins– Client specifies timestamp with write– Writes are idempotent
• Infinite retries allowed
Dynamic Column Families
![Page 36: Intro to Cassandra](https://reader033.fdocuments.us/reader033/viewer/2022052903/557a35ced8b42a32248b4909/html5/thumbnails/36.jpg)
Dynamic Column Families Other Examples:
– Timeline of tweets by a user– Timeline of tweets by all of the people a user is
following– List of comments sorted by score– List of friends grouped by state
![Page 37: Intro to Cassandra](https://reader033.fdocuments.us/reader033/viewer/2022052903/557a35ced8b42a32248b4909/html5/thumbnails/37.jpg)
The Data API Two choices
– RPC-based API– CQL
• Cassandra Query Language
![Page 38: Intro to Cassandra](https://reader033.fdocuments.us/reader033/viewer/2022052903/557a35ced8b42a32248b4909/html5/thumbnails/38.jpg)
Inserting Data
INSERT INTO users (KEY, “name”, “age”) VALUES (“thobbs”, “Tyler”, 24);
![Page 39: Intro to Cassandra](https://reader033.fdocuments.us/reader033/viewer/2022052903/557a35ced8b42a32248b4909/html5/thumbnails/39.jpg)
Updating Data
INSERT INTO users (KEY, “age”) VALUES (“thobbs”, 34);
Updates are the same as inserts:
Or
UPDATE users SET “age” = 34 WHERE KEY = “thobbs”;
![Page 40: Intro to Cassandra](https://reader033.fdocuments.us/reader033/viewer/2022052903/557a35ced8b42a32248b4909/html5/thumbnails/40.jpg)
Fetching Data
SELECT * FROM users WHERE KEY = “thobbs”;
Whole row select:
![Page 41: Intro to Cassandra](https://reader033.fdocuments.us/reader033/viewer/2022052903/557a35ced8b42a32248b4909/html5/thumbnails/41.jpg)
Fetching Data
SELECT “name”, “age” FROM users WHERE KEY = “thobbs”;
Explicit column select:
![Page 42: Intro to Cassandra](https://reader033.fdocuments.us/reader033/viewer/2022052903/557a35ced8b42a32248b4909/html5/thumbnails/42.jpg)
Fetching Data
UPDATE letters SET 1='a', 2='b', 3='c', 4='d', 5='e' WHERE KEY = “key”;
SELECT 1..3 FROM letters WHERE KEY = “key”;
Get a slice of columns
Returns [(1, a), (2, b), (3, c)]
![Page 43: Intro to Cassandra](https://reader033.fdocuments.us/reader033/viewer/2022052903/557a35ced8b42a32248b4909/html5/thumbnails/43.jpg)
Fetching Data
SELECT FIRST 2 FROM letters WHERE KEY = “key”;
Get a slice of columns
Returns [(1, a), (2, b)]
SELECT FIRST 2 REVERSED FROM letters WHERE KEY = “key”;
Returns [(5, e), (4, d)]
![Page 44: Intro to Cassandra](https://reader033.fdocuments.us/reader033/viewer/2022052903/557a35ced8b42a32248b4909/html5/thumbnails/44.jpg)
Fetching Data
SELECT 3..'' FROM letters WHERE KEY = “key”;
Get a slice of columns
Returns [(3, c), (4, d), (5, e)]
SELECT FIRST 2 REVERSED 4..'' FROM letters WHERE KEY = “key”;
Returns [(4, d), (3, c)]
![Page 45: Intro to Cassandra](https://reader033.fdocuments.us/reader033/viewer/2022052903/557a35ced8b42a32248b4909/html5/thumbnails/45.jpg)
Deleting Data
DELETE FROM users WHERE KEY = “thobbs”;
Delete a whole row:
DELETE “age” FROM users WHERE KEY = “thobbs”;
Delete specific columns:
![Page 46: Intro to Cassandra](https://reader033.fdocuments.us/reader033/viewer/2022052903/557a35ced8b42a32248b4909/html5/thumbnails/46.jpg)
Secondary Indexes
CREATE INDEX ageIndex ON users (age);
SELECT name FROM USERS WHERE age = 24 AND state = “TX”;
Builtin basic indexes
![Page 47: Intro to Cassandra](https://reader033.fdocuments.us/reader033/viewer/2022052903/557a35ced8b42a32248b4909/html5/thumbnails/47.jpg)
Performance Writes
– 10k – 30k per second per node– Sub-millisecond latency
Reads– 1k – 10k per second per node– Depends on data set, caching– Usually 0.1 to 10ms latency
![Page 48: Intro to Cassandra](https://reader033.fdocuments.us/reader033/viewer/2022052903/557a35ced8b42a32248b4909/html5/thumbnails/48.jpg)
Other Features Distributed Counters
– Can support millions of high-volume counters Excellent Multi-datacenter Support
– Disaster recovery– Locality
Hadoop Integration– Isolation of resources– Hive and Pig drivers
Compression
![Page 49: Intro to Cassandra](https://reader033.fdocuments.us/reader033/viewer/2022052903/557a35ced8b42a32248b4909/html5/thumbnails/49.jpg)
What Cassandra Can't Do Transactions
– Unless you use a distributed lock– Atomicity, Isolation– These aren't needed as often as you'd think
Limited support for ad-hoc queries– Know what you want to do with the data
![Page 50: Intro to Cassandra](https://reader033.fdocuments.us/reader033/viewer/2022052903/557a35ced8b42a32248b4909/html5/thumbnails/50.jpg)
Not One-size-fits-all Use alongside an RDBMS
– Use the RDBMS for highly-transactional or highly-relational data• Usually a small set of data
– Let Cassandra scale to handle the rest
![Page 51: Intro to Cassandra](https://reader033.fdocuments.us/reader033/viewer/2022052903/557a35ced8b42a32248b4909/html5/thumbnails/51.jpg)
Language Support Good:
– Java– Python– Ruby– PHP– C#
Coming Soon:– Everything else, now that we have CQL