Cassandra Training Session 2svn.wso2.org/repos/wso2/people/kasunw/BAM/Cassandra/Cassandra... ·...
Transcript of Cassandra Training Session 2svn.wso2.org/repos/wso2/people/kasunw/BAM/Cassandra/Cassandra... ·...
Outline
• Demand for Scalability
• Designing for Scalability
• Cassandra In Theory
– Data model, Architecture, Configuration, Reading and Writing data
• Cassandra In Action [deep]
– Setting up a cluster, Keyspaces, etc…
– Accessing data and Monitoring a cluster
Demand for Scalability
• Now we are in the Information Age
– Data as well as Consumers are growing
• How much data do you produce / consume in each hour?
– What we need is a data storage with:
• law latency, high throughput, highly scalable and available, low cost, etc…
Demand for Scalability Contd…
• Can the relational database be the Silver Bullet?
– Its strengths for one use case can become a bottleneck for another use case
• ACID -> 2PC commit -> blocking -> not scalable
• Schema -> Normalization -> Lots of Tables -> Lots of JOIN operations and complex queries -> not scalable
Designing for Scalability
• Sharding
Data Store
Data Store-Lwords begin
withL
Data Store-Swords begin
withS
Query for Leonardo da Vinci
Query for Sherlock Holmes
Query for Leonardo da Vinci
Query for Sherlock Holmes
26 Nodes
Designing for Scalability Contd …
• Shared-Nothing Architecture
Data Store-Awords begin
withY, Z, A
Data Store-Owords begin
withM, N, O
Data Store-Zwords begin
withX, Y, Z,
Data Store-Pwords begin
withP, O, P
Cassandra
• Designed for Scalability
• Dynamo’s Architecture and BigTable’s Data model
• CAP (Consistency , Availability , Partition Tolerance) Theorem
– A and P
Cassandra Data Model
• A distributed multidimensional map with 4 or 5 dimensions.
• What about the Relational Model?
– Is it a 4-D map?
• {Database, Table, Row, Column} => Value (Cell)
{CarbonDB, permission table, resource path, users} => A list of user names
Data Model Contd…
• 4D Model : [Keyspace][ColumnFamily][Key][Column]
• Keyspace -> Column Family
• Column Family -> Column Family Row
• Column Family Row -> Columns
• Column -> Data value
My Yahoo:AddressBook:
friend_one:name: foophone No : 3234353
Keyspace
CF
Row
Column
{My Yahoo, AddressBook, friend_one, name} => foo
Data Model Contd…
• 5D : [Keyspace][ColumnFamily][Key][SuperColumn][SubColumn]
• Keyspace -> Super Column Family
• Super Column Family -> Super Column Family Row
• Super Column Family Row -> Super Columns
• Super Column -> Columns
• Column -> Data valueMy Yahoo:
AddressBook:WSO2:
friend_one:name: foophone No : 3234353
Keyspace
SCF
Row
Super Column
Column
Data Model Contd…
• Cluster is a container for keyspaces.
• Rows as well as Columns in a row are sorted.
• Schema-FreeProfile:
personal : {name: foo}, { address: some},{phone: 081}, …education: {primary: somewhere }, {secondary: somewhere}, …hobbies : {sports: some details},{music: some details}, ….
Cassandra needs one CF or SCF
Relational DB need more tables (Normalize)
But
Data Model Contd…
• Secondary Index
– Indexed by columns
Name Main God Conquered By Archeological Site
Inca Sun Spanish Machu Picchu
Maya Sun Spanish Palenque
What is the Main God of the Ancient Maya Civilization ?Primary Index
What are the civilizations conquered by Spanish?
What is the civilization conquered by Spanish and having the Archeological Site Machu Picchu?
Secondary Index
Cassandra Architecture
• System Keyspace
• A peer-to-peer distribution model
– Decentralized -> Availability and Scalability
01
2
A
B
C
D
Keys mapped to the Token range [0.1,1] => B, C, D
Architecture Contd…
• Gossip and Failure Detection
– The gossiper runs periodically and knows who are dead and live.
• Use Phi Accrual Failure Detection algorithm
– Support decentralization and partition tolerance
Architecture Contd…
• Commit Logs
– A single log file for server
– Provides Durability
• Memtables
• SSTables
............... …….. …….. For what CF Flushed?Data
Architecture Contd…
• Hinted Handoff
– Support Write Availability
Node A
Node B
A Request For Node B
B is down
Create a Hint
Architecture Contd…
• Compaction
– Two types
• Minor – Flush Memtable and create SSTables
• Major – Merge SSTables
• Bloom Filters - Is this data with You?
• Tombstones
Architecture Contd…
• Anti-Entropy and Read Repair
– Replica synchronization
• Staged Event-Driven Architecture (SEDA)
– Read, Mutation, Gossip, Response, Anti-Entropy, Load Balance, Migration, and Streaming
Configuring Cassandra Contd…
• Replication Factor (RF)
RF = 3
Read -- RWrite -- WQuorum – Q = RF/2 +1
Strong consistencyR + W > RF2 + 2 > 3 (R = W = Q)
Configuring Cassandra Contd…
• Partitioners – how is sharding done?
– Random Partitioner
– Order-Preserving Partitioner
– Byte-Ordered Partitioner
Good Sharding
Even Work load
Good Performance
Configuring Cassandra Contd…
• Snitches – Who are my neighbours?
– Simple Snitch
• Comparing different octets in the IP addresses
– PropertyFileSnitch
• Creating and Managing a Cluster
– The bootstrap token – How I know what is my data?
– Seed Nodes – I can get from them my token as well as my data
Reading Data
• Read at any node !
Read Request
NF = 3 , R =2 , W = 1
2 Nodes (Sync)
1 Node (ASync)
Read Repair
Monitoring
• Provide a rich JMX based monitoring
– Each SDEA stages
– Database
• Caches, Column Family Stores, the Commit Log, and the
Compaction Manager
– Some Metrics
• Read Count, Read Latency, Write Count and Write Latency, Pending Tasks
Cassandra in Action
• Setting Up a Cluster
– Add / Remove nodes
• Create and Configure a Keyspace
• Create and Configure a CF
• Writing Data
• Reading Data
• Monitoring though JMX
Hector Client API
Cluster newCluster = HFactory.createCluster(clusterName,
cassandraHostConfigurator, credentials);
KeyspaceDefinition definition =HFactory.createKeyspaceDefinition(keyspaceName,
replicationStrategy, replicationFactor, cfs);cluster.addKeyspace(definition);
ColumnFamilyDefinition cfDefinition = HFactory.createColumnFamilyDefinition( keyspaceName,
cfName);cluster.addColumnFamily(cfDefinition);
Hector Client API Contd…
Keyspace keyspace = HFactory.createKeyspace(keyspaceName, cluster);Mutator<String> mutator =
HFactory.createMutator(keyspace, new StringSerializer());mutator.insert(rowkey, cfName, HFactory.createStringColumn(name, value));
Keyspace keyspace = HFactory.createKeyspace(keyspaceName, cluster);ColumnQuery<String, String, String> columnQuery =
HFactory.createStringColumnQuery(keyspace);columnQuery.setColumnFamily(cfName).setKey(rowKey).setName(cName);QueryResult<HColumn<String, String>> result = columnQuery.execute();
Conclusion
• We discussed the data model, architecture, how reading and writing data happen
• We did a simple tutorial on how to setup a cluster, write and read data