DataStax | Best Practices for Securing DataStax Enterprise (Matt Kennedy) | Cassandra Summit 2016
Introduction to Cassandra and datastax DSE
-
Upload
ulises-fasoli -
Category
Technology
-
view
622 -
download
8
Transcript of Introduction to Cassandra and datastax DSE
![Page 1: Introduction to Cassandra and datastax DSE](https://reader035.fdocuments.us/reader035/viewer/2022081419/58887b111a28ab34788b63a1/html5/thumbnails/1.jpg)
2013 © Trivadis
BASEL BERN BRUGES LAUSANNE ZÜRICH DÜSSELDORF FRANKFURT A.M. FREIBURG I.BR. HAMBURG MUNICH STUTTGART VIENNA
2013 © Trivadis
Architecture et modèle de données CassandraGenève 26.01.2015
Ulises Fasoli
Senior Consultant
Trivadis AG
January 2016Architecture et modèle de données Cassandra
1
![Page 2: Introduction to Cassandra and datastax DSE](https://reader035.fdocuments.us/reader035/viewer/2022081419/58887b111a28ab34788b63a1/html5/thumbnails/2.jpg)
2013 © Trivadis
Agenda
1. Introduction to NoSQL datastores and Polyglot Persistence
2. What is Apache Cassandra?
3. Why Cassandra, What is DataStax?
4. Cassandra Architecture
5. Cassandra Data Model
6. Cassandra Query Language (CQL)
7. Cassandra/DataStax @ Trivadis
January 2016Architecture et modèle de données Cassandra
2
![Page 3: Introduction to Cassandra and datastax DSE](https://reader035.fdocuments.us/reader035/viewer/2022081419/58887b111a28ab34788b63a1/html5/thumbnails/3.jpg)
2013 © Trivadis
History of Databases
1960s File-based, Network (CODASYL) and Hierarchical Databases
1970s Relational Database
1980 SQL became the standard query language
Early 1990 Object-Databases
Late 1990 XML Databases
2004 NoSQL Databases
January 2016Architecture et modèle de données Cassandra
3
![Page 4: Introduction to Cassandra and datastax DSE](https://reader035.fdocuments.us/reader035/viewer/2022081419/58887b111a28ab34788b63a1/html5/thumbnails/4.jpg)
2013 © Trivadis
What‘s wrong with Relational Databases ?
• SQL provides a rich, declarative query language
• Database enforce referential integrity
• ACID semantics
• Well understood by developers, database administrators
• Well supported by different languages, frameworks and tools• Hibernate, JPA, JDBC, iBATIS, Entity Framework
• Well understood and accepted by operations people (DBAs)• Configuration• Monitoring• Backup and Recovery• Tuning• Design
January 2016Architecture et modèle de données Cassandra
4
They are great ….
![Page 5: Introduction to Cassandra and datastax DSE](https://reader035.fdocuments.us/reader035/viewer/2022081419/58887b111a28ab34788b63a1/html5/thumbnails/5.jpg)
2013 © Trivadis
Relational Databases are great ... But!New trends
Big Data
Concurrency
Connectivity
Diversity
P2P Knowledge
Cloud/Grid
January 2016Architecture et modèle de données Cassandra
5
![Page 6: Introduction to Cassandra and datastax DSE](https://reader035.fdocuments.us/reader035/viewer/2022081419/58887b111a28ab34788b63a1/html5/thumbnails/6.jpg)
2013 © Trivadis
Relational Databases are great ... But!Problem: Complex Object Graphs
Object/Relational impedance mismatch
Complicated to map rich domain modelto relational schema
Performance issues• Many rows in many tables• Many joins• Eager vs. lazy loading
ORDER
ADDRESS
CUSTOMER
ORDER_LINES
OrderID: 1001Order Date: 15.9.2012
Line Items
Customer
First Name: PeterLast Name: Sample
Billing AddressStreet: Somestreet 10City: SomewherePostal Code: 55901
Name
Ipod Touch
Monster Beat
Apple Mouse
Quantity
1
2
1
Price
220.95
190.00
69.90
January 2016Architecture et modèle de données Cassandra
6
![Page 7: Introduction to Cassandra and datastax DSE](https://reader035.fdocuments.us/reader035/viewer/2022081419/58887b111a28ab34788b63a1/html5/thumbnails/7.jpg)
2013 © Trivadis
Relational Databases are great ... But!Problem: Schema evolution
Adding attributes to an object => have to add columns to table
Expensive, if lots of data in that table
Holding locks on the tables for long time
What if new values should be mandatory, cannot enforce NOT NULL constraint
Application downtime …
January 2016Architecture et modèle de données Cassandra
7
![Page 8: Introduction to Cassandra and datastax DSE](https://reader035.fdocuments.us/reader035/viewer/2022081419/58887b111a28ab34788b63a1/html5/thumbnails/8.jpg)
2013 © Trivadis
Relational Databases are great ... But!Problem: Semi-structured data
Relational schema doesn‘t easily handle semi-structured data
Common solutions Name/Value table
- Poor performance- Lack of constraint
Serialize as Blob- Fewer joins, but no query capabilities
January 2016Architecture et modèle de données Cassandra
8
![Page 9: Introduction to Cassandra and datastax DSE](https://reader035.fdocuments.us/reader035/viewer/2022081419/58887b111a28ab34788b63a1/html5/thumbnails/9.jpg)
2013 © Trivadis
RDBMSDatabas
e
Relational Databases are great ... But!Problem: Scaling
Scaling writes difficult/expensive/impossible => Big Data
Scaling a relational database: Vertical scaling is limited and is expensive Horizontal scaling is limited and is expensive
RDBMSDatabas
e
RDBMSDatabas
e
RDBMSDatabas
e
RDBMSDatabas
e RDBMSDatabas
e
Node 1
Node 2
P1 P2 P3
Client
Client
Client
Client
Single DB => Partitioned Table => Database Sharding => Database Cluster
January 2016Architecture et modèle de données Cassandra
9
![Page 10: Introduction to Cassandra and datastax DSE](https://reader035.fdocuments.us/reader035/viewer/2022081419/58887b111a28ab34788b63a1/html5/thumbnails/10.jpg)
2013 © Trivadis
So, what’s Wrong With RDBMS?
• Many programmers are already familiar with it.
• Transactions and ACID make development easy.
• Lots of tools to use.
• Rigid schema design.
• Harder to scale.
• Replication.
January 2016Architecture et modèle de données Cassandra
10
Nothing
No one size fits all
![Page 11: Introduction to Cassandra and datastax DSE](https://reader035.fdocuments.us/reader035/viewer/2022081419/58887b111a28ab34788b63a1/html5/thumbnails/11.jpg)
2013 © Trivadis
Solution: NoSQL ?
No standard definition of what NoSQL means
• Not Only SQL and not No SQL
• Not only relational would have been better
Term began in a workshop organized in 2009
Use the right tools (DBs) for the job
It is more like a feature set, or event the not of a feature set
January 2016Architecture et modèle de données Cassandra
11
![Page 12: Introduction to Cassandra and datastax DSE](https://reader035.fdocuments.us/reader035/viewer/2022081419/58887b111a28ab34788b63a1/html5/thumbnails/12.jpg)
2013 © Trivadis
Use Cases for NoSQL
• Massive write performance.
• Fast key value look ups.
• Flexible schema and data types.
• No single point of failure.
• Fast prototyping and development.
• Out of the box scalability.
• Easy maintenance.
January 2016Architecture et modèle de données Cassandra
12
![Page 13: Introduction to Cassandra and datastax DSE](https://reader035.fdocuments.us/reader035/viewer/2022081419/58887b111a28ab34788b63a1/html5/thumbnails/13.jpg)
2013 © Trivadis
Architecture et modèle de données Cassandra13
Brewer's CAP Theorem
Any networked shared-data system can have at most two of the three desirable properties:
ConsistencyAll of the nodes see the same data at the same time, regardless of where the data is stored
AvailabilityNode failures do not preventsurvivors from continuing to operate
Network Partition toleranceThe system continues to operate despite arbitrary message loss
January 2016
Availability
Consistency
NetworkPartition
Tolerance
n/a
CA CP
AP
![Page 14: Introduction to Cassandra and datastax DSE](https://reader035.fdocuments.us/reader035/viewer/2022081419/58887b111a28ab34788b63a1/html5/thumbnails/14.jpg)
2013 © Trivadis
Data Store Positioning
January 2016Architecture et modèle de données Cassandra
14
Sca
labi
lity
Standardized Model, Tooling, Complexity
Key-value
Wide Column (Column Families / Extensible Records)
Document
GraphRelational
SQL Comfort Zone
Multi Dimensional
![Page 15: Introduction to Cassandra and datastax DSE](https://reader035.fdocuments.us/reader035/viewer/2022081419/58887b111a28ab34788b63a1/html5/thumbnails/15.jpg)
2013 © Trivadis
Architecture et modèle de données Cassandra
Polyglot Persistence
In 2006, Neal Ford coined the term Polyglot Programming Applications should be written in a mix of
languages to take advantage of the fact that different languages are suitable for tackling different problems
Polyglot Persistence defines a a hybrid approach to persistence
Using multiple data storage technologies
Selected based on the way data is being used by individual applications
Why store binary images in RDBMs, when there are better storage systems?
January 2016
15
Polyglot Programmer
![Page 16: Introduction to Cassandra and datastax DSE](https://reader035.fdocuments.us/reader035/viewer/2022081419/58887b111a28ab34788b63a1/html5/thumbnails/16.jpg)
2013 © Trivadis
Architecture et modèle de données Cassandra
Polyglot Persistence
Today we use the same database for all kind of data
• Business transactions, session management data, reporting, logging information, content information, ...
No need for same properties of availability, consistency or backup requirements
Polyglot Data Storage Usage allows to mix and match Relational and NoSQL data stores
January 2016
16
Polygot Persistence Model
E-commerce Application
Shopping cart data User Sessions Product Catalog RecomendationsCompleted Order
Key-Value RDMBS Document Graph
„Traditional“ Persistence Model
E-commerce Application
RDBMS
Shopping cart data User Sessions Product Catalog RecomendationsCompleted Order
![Page 17: Introduction to Cassandra and datastax DSE](https://reader035.fdocuments.us/reader035/viewer/2022081419/58887b111a28ab34788b63a1/html5/thumbnails/17.jpg)
2013 © Trivadis
Agenda
1. Introduction to NoSQL datastores and Polyglot Persistence
2. What is Apache Cassandra?
3. Why Cassandra, What is DataStax?
4. Cassandra Architecture
5. Cassandra Data Model
6. Cassandra Query Language (CQL)
7. Cassandra/DataStax @ Trivadis
January 2016Architecture et modèle de données Cassandra
17
![Page 18: Introduction to Cassandra and datastax DSE](https://reader035.fdocuments.us/reader035/viewer/2022081419/58887b111a28ab34788b63a1/html5/thumbnails/18.jpg)
2013 © Trivadis
Definition of Cassandra
Apache Cassandra™ is a free
• Distributed…
• High performance…
• Extremely scalable…
• Fault tolerant (i.e. no single point of failure)…
post-relational database solution.
Cassandra can serve as both real-time Datastore (the "system of record") for online/transactional applications, and as a read-intensive database for business intelligence systems.
January 2016Architecture et modèle de données Cassandra
18
![Page 19: Introduction to Cassandra and datastax DSE](https://reader035.fdocuments.us/reader035/viewer/2022081419/58887b111a28ab34788b63a1/html5/thumbnails/19.jpg)
2013 © Trivadis
History of Cassandra
January 2016Architecture et modèle de données Cassandra
19
Bigtable Dynamo
![Page 20: Introduction to Cassandra and datastax DSE](https://reader035.fdocuments.us/reader035/viewer/2022081419/58887b111a28ab34788b63a1/html5/thumbnails/20.jpg)
2013 © Trivadis
Architecture Overview
Cassandra was designed with the understanding that system/hardware failures can and do occur :
• Peer-to-peer, distributed system
• All nodes the same
• Data partitioned among all nodes in the cluster
• Custom data replication to ensure fault tolerance
• Read/Write-anywhere design
January 2016Architecture et modèle de données Cassandra
20
![Page 21: Introduction to Cassandra and datastax DSE](https://reader035.fdocuments.us/reader035/viewer/2022081419/58887b111a28ab34788b63a1/html5/thumbnails/21.jpg)
2013 © Trivadis
Big Data Scalability
• Capable of comfortably scaling to petabytes
• New nodes = Linear performance increases
• Add new nodes online
January 2016Architecture et modèle de données Cassandra
21
![Page 22: Introduction to Cassandra and datastax DSE](https://reader035.fdocuments.us/reader035/viewer/2022081419/58887b111a28ab34788b63a1/html5/thumbnails/22.jpg)
2013 © Trivadis
Who is using Cassandra?
January 2016Architecture et modèle de données Cassandra
22
Largest publicly known cluster has over 300 TB of data spanning 400 machines
![Page 23: Introduction to Cassandra and datastax DSE](https://reader035.fdocuments.us/reader035/viewer/2022081419/58887b111a28ab34788b63a1/html5/thumbnails/23.jpg)
2013 © Trivadis
Agenda
1. Introduction to NoSQL datastores and Polyglot Persistence
2. What is Apache Cassandra?
3. Why Cassandra, What is DataStax?
4. Cassandra Architecture
5. Cassandra Data Model
6. Cassandra Query Language (CQL)
7. Cassandra/DataStax @ Trivadis
January 2016Architecture et modèle de données Cassandra
23
![Page 24: Introduction to Cassandra and datastax DSE](https://reader035.fdocuments.us/reader035/viewer/2022081419/58887b111a28ab34788b63a1/html5/thumbnails/24.jpg)
2013 © Trivadis
Why Cassandra?
Tunable data consistency
Flexible schema design
Data Compression
CQL language (like SQL)
Support for key languages and platforms
No need for special hardware or software
Gigabyte to Petabyte scalability
Linear performance gains through adding nodes
No single point of failure
Easy replication / data distribution
Multi-data center and Cloud capable
No need for separate caching layer
January 2016Architecture et modèle de données Cassandra
24
![Page 25: Introduction to Cassandra and datastax DSE](https://reader035.fdocuments.us/reader035/viewer/2022081419/58887b111a28ab34788b63a1/html5/thumbnails/25.jpg)
2013 © Trivadis
Cassandra Use Cases
Product Catalog / Playlists
Personalization
• Ads
• Recommendations
• Ratings
Fraud Detection
Time Series
• Finance
• Smart Meter
IoT / Sensor Data
Graph / Network data
January 2016Architecture et modèle de données Cassandra
25
![Page 26: Introduction to Cassandra and datastax DSE](https://reader035.fdocuments.us/reader035/viewer/2022081419/58887b111a28ab34788b63a1/html5/thumbnails/26.jpg)
2013 © Trivadis
DataStax Enterprise Edition (DSE)
January 2016Architecture et modèle de données Cassandra
26
![Page 27: Introduction to Cassandra and datastax DSE](https://reader035.fdocuments.us/reader035/viewer/2022081419/58887b111a28ab34788b63a1/html5/thumbnails/27.jpg)
2013 © Trivadis
Datastax OpsCenter
January 2016Architecture et modèle de données Cassandra
27
![Page 28: Introduction to Cassandra and datastax DSE](https://reader035.fdocuments.us/reader035/viewer/2022081419/58887b111a28ab34788b63a1/html5/thumbnails/28.jpg)
2013 © Trivadis
Agenda
1. Introduction to NoSQL datastores and Polyglot Persistence
2. What is Apache Cassandra?
3. Why Cassandra, What is DataStax?
4. Cassandra Architecture
5. Cassandra Data Model
6. Cassandra Query Language (CQL)
7. Cassandra/DataStax @ Trivadis
January 2016Architecture et modèle de données Cassandra
28
![Page 29: Introduction to Cassandra and datastax DSE](https://reader035.fdocuments.us/reader035/viewer/2022081419/58887b111a28ab34788b63a1/html5/thumbnails/29.jpg)
2013 © Trivadis
Architecture Overview
Each node communicates with each other through the Gossip protocol, which exchanges information across the cluster every second
A commit log is used on each node to capture write activity. Data durability is assured
Data also written to an in-memory structure (memtable) and then to disk once the memory structure is full (an SSTable)
January 2016Architecture et modèle de données Cassandra
29
![Page 30: Introduction to Cassandra and datastax DSE](https://reader035.fdocuments.us/reader035/viewer/2022081419/58887b111a28ab34788b63a1/html5/thumbnails/30.jpg)
2013 © Trivadis
No Single Point of Failure
All nodes the same
Customized replication affords tunable data redundancy
Read/write from any node
Can replicate data among different physical data center racks
January 2016Architecture et modèle de données Cassandra
30
![Page 31: Introduction to Cassandra and datastax DSE](https://reader035.fdocuments.us/reader035/viewer/2022081419/58887b111a28ab34788b63a1/html5/thumbnails/31.jpg)
2013 © Trivadis
Easy Replication / Data Distribution
Transparently handled by Cassandra
Multi-data center capable
Exploits all the benefits of Cloud computing
Able to do hybrid Cloud/On-premise setup
January 2016Architecture et modèle de données Cassandra
31
![Page 32: Introduction to Cassandra and datastax DSE](https://reader035.fdocuments.us/reader035/viewer/2022081419/58887b111a28ab34788b63a1/html5/thumbnails/32.jpg)
2013 © Trivadis
Partitioning
• Nodes are logically structured in Ring Topology.
• Hashed value of key associated with data partition is used to assign it to a node in the ring.
• Lightly loaded nodes moves position to alleviate highly loaded nodes.
January 2016Architecture et modèle de données Cassandra
32
![Page 33: Introduction to Cassandra and datastax DSE](https://reader035.fdocuments.us/reader035/viewer/2022081419/58887b111a28ab34788b63a1/html5/thumbnails/33.jpg)
2013 © Trivadis
Data Replication
Replication for high availability and data durability
• Replication factor N: each row is replicated at N nodes
• Each row key k is assigned to a coordination node
• The coordinator node is responsible for replicating the rows within its key range
January 2016Architecture et modèle de données Cassandra
33
![Page 34: Introduction to Cassandra and datastax DSE](https://reader035.fdocuments.us/reader035/viewer/2022081419/58887b111a28ab34788b63a1/html5/thumbnails/34.jpg)
2013 © Trivadis
Partitioning and Replication
January 2016Architecture et modèle de données Cassandra
34
01
1/2
F
E
D
C
B
A N=3
h(key2)
h(key1)
![Page 35: Introduction to Cassandra and datastax DSE](https://reader035.fdocuments.us/reader035/viewer/2022081419/58887b111a28ab34788b63a1/html5/thumbnails/35.jpg)
2013 © Trivadis
Data Replication
Each data item is replicated at N (replication factor) nodes.
Different Replication Policies
Rack Unaware – replicate data at N-1 successive nodes after its coordinator
Rack Aware – uses 'Zookeeper' to choose a leader which tells nodes the range they are replicas for
Datacenter Aware – similar to Rack Aware but leader is chosen at Datacenter level instead of Rack level.
January 2016Architecture et modèle de données Cassandra
35
![Page 36: Introduction to Cassandra and datastax DSE](https://reader035.fdocuments.us/reader035/viewer/2022081419/58887b111a28ab34788b63a1/html5/thumbnails/36.jpg)
2013 © Trivadis
Write Path
When a write occurs, Cassandra stores the data in a structure in memory, the Memtable, and also appends writes to the commit log on disk, providing configurable durability.
January 2016Architecture et modèle de données Cassandra
36
![Page 37: Introduction to Cassandra and datastax DSE](https://reader035.fdocuments.us/reader035/viewer/2022081419/58887b111a28ab34788b63a1/html5/thumbnails/37.jpg)
2013 © Trivadis
Write Requests
Coordinator sends a write request to all replicas that own the row being written
January 2016Architecture et modèle de données Cassandra
37
![Page 38: Introduction to Cassandra and datastax DSE](https://reader035.fdocuments.us/reader035/viewer/2022081419/58887b111a28ab34788b63a1/html5/thumbnails/38.jpg)
2013 © Trivadis
Write Consistency
The consistency level for writing to Cassandra specifies how many replicas the write must succeed before returning an ACK to the client
• Quorum: (replication_factor / 2) + 1
January 2016Architecture et modèle de données Cassandra
38
![Page 39: Introduction to Cassandra and datastax DSE](https://reader035.fdocuments.us/reader035/viewer/2022081419/58887b111a28ab34788b63a1/html5/thumbnails/39.jpg)
2013 © Trivadis
Read Path
When a read request for a row comes in to a node, the row must be combined from all SSTables on that node that contain columns from the row in question
as well as from any unflushed memtables, to produce the requested data
January 2016Architecture et modèle de données Cassandra
39
![Page 40: Introduction to Cassandra and datastax DSE](https://reader035.fdocuments.us/reader035/viewer/2022081419/58887b111a28ab34788b63a1/html5/thumbnails/40.jpg)
2013 © Trivadis
Read RequestsThere are two types of read requests that a coordinator can send to a replica:
• A direct read request• A background read repair request
The number of replicas contacted by a direct read request is determined by the consistency level specified by the client.
January 2016Architecture et modèle de données Cassandra
40
![Page 41: Introduction to Cassandra and datastax DSE](https://reader035.fdocuments.us/reader035/viewer/2022081419/58887b111a28ab34788b63a1/html5/thumbnails/41.jpg)
2013 © Trivadis
Read Consistency
The consistency level for reading from Cassandra specified how many replicas must respond before a result is returned to the client
• Quorum: (replication_factor / 2) + 1
January 2016Architecture et modèle de données Cassandra
41
![Page 42: Introduction to Cassandra and datastax DSE](https://reader035.fdocuments.us/reader035/viewer/2022081419/58887b111a28ab34788b63a1/html5/thumbnails/42.jpg)
2013 © Trivadis
Agenda
1. Introduction to NoSQL datastores and Polyglot Persistence
2. What is Apache Cassandra?
3. Why Cassandra, What is DataStax?
4. Cassandra Architecture
5. Cassandra Data Model
6. Cassandra Query Language (CQL)
7. Cassandra/DataStax @ Trivadis
January 2016Architecture et modèle de données Cassandra
42
![Page 43: Introduction to Cassandra and datastax DSE](https://reader035.fdocuments.us/reader035/viewer/2022081419/58887b111a28ab34788b63a1/html5/thumbnails/43.jpg)
2013 © Trivadis
Cassandra Data Model
• Table is a multi dimensional map indexed by key (row key).
• Columns are grouped into Column Families
• Dynamic schema design allows for much more flexible data storage than rigid RDBMS
• Each Column has- Name- Value- Timestamp
January 2016Architecture et modèle de données Cassandra
43
![Page 44: Introduction to Cassandra and datastax DSE](https://reader035.fdocuments.us/reader035/viewer/2022081419/58887b111a28ab34788b63a1/html5/thumbnails/44.jpg)
2013 © Trivadis
How Cassandra stores data
• Model brought from Google Bigtable
• Row Key and a lot of columns
• Column names sorted (UTF8, Int, Timestamp, etc.)
January 2016Architecture et modèle de données Cassandra
44
Column Name … Column Name
Column Value Column ValueTimestamp TimestampTTL TTL
Row Key
1 2 Billion
Billi
on o
f Row
s
![Page 45: Introduction to Cassandra and datastax DSE](https://reader035.fdocuments.us/reader035/viewer/2022081419/58887b111a28ab34788b63a1/html5/thumbnails/45.jpg)
2013 © Trivadis
Cassandra Data Model
January 2016
Keyspace
Architecture et modèle de données Cassandra45
Column Family Column Family
![Page 46: Introduction to Cassandra and datastax DSE](https://reader035.fdocuments.us/reader035/viewer/2022081419/58887b111a28ab34788b63a1/html5/thumbnails/46.jpg)
2013 © Trivadis
Row, row key, column key, and column value
January 2016Architecture et modèle de données Cassandra
46
row key
va
cola
vb
colb
vc
colc
vd
cold
Column keys (or column names)Row
Column values (or cells)
• Rows: individual rows constitute a column family• Row key: uniquely identifies a row in a column family• Row: stores pairs of column keys and column values• Column key: uniquely identifies a column value in a row• Column value : stores one value or a collection of values
![Page 47: Introduction to Cassandra and datastax DSE](https://reader035.fdocuments.us/reader035/viewer/2022081419/58887b111a28ab34788b63a1/html5/thumbnails/47.jpg)
2013 © Trivadis
Static vs. Dynamic Column Family
Static column family (skinny rows)• Contains a predefined set of columns with metadata• Number of columns can vary across multiple rows within the column family• Similar to RDMBS, except no NULL values
January 2016Architecture et modèle de données Cassandra
47
John Lennon
1940
born
England
country
1980
died
Rock
style
artist
type
The BeatlesEngland
country
1957
founded
Rock
style
band
type
![Page 48: Introduction to Cassandra and datastax DSE](https://reader035.fdocuments.us/reader035/viewer/2022081419/58887b111a28ab34788b63a1/html5/thumbnails/48.jpg)
2013 © Trivadis
What is a wide row?Rows may be described as “skinny” or “wide”
Wide row – has a relatively large number of column keys (hundreds or thousands); this number may increase as new data values are inserted - For example, a row that stores all bands of the same style - The number of such bands will increase as new bands are formed
Note that column values do not exist in this example- The column key – in this case a band name – stores all the data desired- Could have stored the number of albums, or year founded, etc., as column
values©2014 DataStax Training. Use only with permission.
Slide 48
RockThe Animals The Beatles...
...
...
...
...
...
![Page 49: Introduction to Cassandra and datastax DSE](https://reader035.fdocuments.us/reader035/viewer/2022081419/58887b111a28ab34788b63a1/html5/thumbnails/49.jpg)
2013 © Trivadis
What are composite row key and composite column key?
Composite row key – multiple components separated by colon
‘Revolver’ and 1966 are the album title and year‘tracks’ value is a collection (map)
Composite column key – multiple components separated by colonComposite column keys are sorted by each component
©2014 DataStax Training. Use only with permission.
Slide 49
Revolver:1966Rock
genre
The Beatles
performer
{1: 'Taxman', ..., 14: 'Tomorrow Never Knows'}
tracks
Revolver:1966Taxman
1:title
Eleanor Rigby
2:title
Tomorrow Never Knows
14:title...
...
![Page 50: Introduction to Cassandra and datastax DSE](https://reader035.fdocuments.us/reader035/viewer/2022081419/58887b111a28ab34788b63a1/html5/thumbnails/50.jpg)
2013 © Trivadis
Data Modelling with Cassandra
• De-normalize, De-normalize, De-normalize• Forget about old-school 3NF• De-normalize wherever you can for quicker retrieval and let application logic
handle the responsibility of reliably updating redundancies
• Rows are gigantic and sorted• Giga-sized rows (2 billion columns max) can be used to store sortable and
sliceable columns• Comments by timestamp, ordered bids by quoted price, Ratings by product, ..
• One row, one machine• Each row stays on one machine• Rows are not shared across nodes• Beware of this, don't create hotspots with a high demand row!
January 2016Architecture et modèle de données Cassandra
50
From Query to Model
![Page 51: Introduction to Cassandra and datastax DSE](https://reader035.fdocuments.us/reader035/viewer/2022081419/58887b111a28ab34788b63a1/html5/thumbnails/51.jpg)
2013 © Trivadis
Remember this
• Cassandra finds rows fast
• Cassandra scans columns fast
• Cassandra does not scan rows
January 2016Architecture et modèle de données Cassandra
51
![Page 52: Introduction to Cassandra and datastax DSE](https://reader035.fdocuments.us/reader035/viewer/2022081419/58887b111a28ab34788b63a1/html5/thumbnails/52.jpg)
2013 © Trivadis
Agenda
1. Introduction to NoSQL datastores and Polyglot Persistence
2. What is Apache Cassandra?
3. Why Cassandra, What is DataStax?
4. Cassandra Architecture
5. Cassandra Data Model
6. Cassandra Query Language (CQL)
7. Cassandra/DataStax @ Trivadis
January 2016Architecture et modèle de données Cassandra
52
![Page 53: Introduction to Cassandra and datastax DSE](https://reader035.fdocuments.us/reader035/viewer/2022081419/58887b111a28ab34788b63a1/html5/thumbnails/53.jpg)
2013 © Trivadis
Cassandra API – Thrift vs. CQL
Thrift• exposes the internal storage structure of Cassandra pretty much directly• Complicated, low-level, full control• legacy
CQL• New way to go • Provides thin abstraction layer over Cassandra's internal structure• Hides some distracting and useless implementation details • Allows to provide native syntax for common encodings/idioms (like
collections) instead of letting each client (library) re-implement them in their own, different and thus incompatible way
January 2016Architecture et modèle de données Cassandra
53
![Page 54: Introduction to Cassandra and datastax DSE](https://reader035.fdocuments.us/reader035/viewer/2022081419/58887b111a28ab34788b63a1/html5/thumbnails/54.jpg)
2013 © Trivadis
CQL Language
Very similar to RDBMS SQL syntax
Create objects via DDL (e.g. CREATE…)
Core DML commands supported: INSERT, UPDATE, DELETE
Query data with SELECT
Current version is CQL3
January 2016Architecture et modèle de données Cassandra
54
![Page 55: Introduction to Cassandra and datastax DSE](https://reader035.fdocuments.us/reader035/viewer/2022081419/58887b111a28ab34788b63a1/html5/thumbnails/55.jpg)
2013 © Trivadis
CQL Shell for Apache Cassandra
cqlsh is the command line utility for execution CQL commands (think of SQL*Plus for Cassandra)
CQL3 is default since Cassandra 1.2
January 2016Architecture et modèle de données Cassandra
55
$ cqlshConnected to DataStaxCluster at localhost:9160.[cqlsh 4.1.0 | Cassandra 2.0.5.24 | CQL spec 3.1.1 | Thrift protocol 19.39.0]Use HELP for help.cqlsh>
![Page 56: Introduction to Cassandra and datastax DSE](https://reader035.fdocuments.us/reader035/viewer/2022081419/58887b111a28ab34788b63a1/html5/thumbnails/56.jpg)
2013 © Trivadis
The CQL/Cassandra Mapping – Static Table
January 2016
name | age | role-----+-----+-----john | 37 | deveric | 38 | ceo
age role
john 37 dev
Eric 38 ceo
CREATE TABLE employee ( name text PRIMARY KEY, age int, role text);
Architecture et modèle de données Cassandra56
![Page 57: Introduction to Cassandra and datastax DSE](https://reader035.fdocuments.us/reader035/viewer/2022081419/58887b111a28ab34788b63a1/html5/thumbnails/57.jpg)
2013 © Trivadis
Create a Dynamic table (wide-row) Employee
A Dynamic Table is also created with the CREATE TABLE statement but using a composite primary key
January 2016Architecture et modèle de données Cassandra
57
cqlsh:training> CREATE TABLE employees ( company text, name text, age int, role text, PRIMARY KEY (company,name) );
![Page 58: Introduction to Cassandra and datastax DSE](https://reader035.fdocuments.us/reader035/viewer/2022081419/58887b111a28ab34788b63a1/html5/thumbnails/58.jpg)
2013 © Trivadis
The CQL/Cassandra Mapping – Dynamic Table
January 2016
company | name | age | role--------+------+-----+-----OSC | eric | 38 | ceoOSC | john | 37 | devRKG | anya | 29 | leadRKG | ben | 27 | devRKG | chad | 35 | ops
eric:age eric:role john:age john:role
OSC 38 dev 37 dev
anya:age anya:role ben:age ben:role chad:age chad:role
RKG 29 lead 27 dev 35 ops
CREATE TABLE employees ( company text, name text, age int, role text, PRIMARY KEY (company,name));
Architecture et modèle de données Cassandra58
![Page 59: Introduction to Cassandra and datastax DSE](https://reader035.fdocuments.us/reader035/viewer/2022081419/58887b111a28ab34788b63a1/html5/thumbnails/59.jpg)
2013 © Trivadis
Insert data into Employee
The INSERT command is similar to the SQL counterpart
Major difference is that the PRIMARY KEY is always required
If the same statement is executed twice, there will be no error
if same PRIMARY KEY value is reused with different other column value, then the last one wins!
January 2016Architecture et modèle de données Cassandra
59
cqlsh:training> INSERT INTO employee (name, age, role) VALUES ('john', 37, 'dev');
cqlsh:training> INSERT INTO employee (name, age, role) VALUES ('eric', 38, 'ceo');
![Page 60: Introduction to Cassandra and datastax DSE](https://reader035.fdocuments.us/reader035/viewer/2022081419/58887b111a28ab34788b63a1/html5/thumbnails/60.jpg)
2013 © Trivadis
Retrieving data from Employee table (II)
Restriction on column other than PRIMARY KEY won't work
Can be solved with an Index (but be careful, better use de-normalization)
January 2016Architecture et modèle de données Cassandra
60
cqlsh:training> SELECT * FROM employee WHERE age = 37;Bad Request: No indexed columns present in by-columns clause with Equal operator
cqlsh:training> CREATE INDEX employee_age_idx ON employee (age);cqlsh:training> SELECT * FROM employee WHERE age = 37; name | age | role------+-----+------ john | 37 | dev(1 rows)
![Page 61: Introduction to Cassandra and datastax DSE](https://reader035.fdocuments.us/reader035/viewer/2022081419/58887b111a28ab34788b63a1/html5/thumbnails/61.jpg)
2013 © Trivadis
Update data in Employee
The UPDATE statement is similar to the SQL UPDATE command
Just as with the INSERT, the PRIMARY KEY column must be specified as part of the UPDATE
In CQL the UPDATE does not check for the existence of the row, if it does not exist, CQL will just create it
January 2016Architecture et modèle de données Cassandra
61
cqlsh:training> UPDATE employee SET age = 38 WHERE name = 'john';
![Page 62: Introduction to Cassandra and datastax DSE](https://reader035.fdocuments.us/reader035/viewer/2022081419/58887b111a28ab34788b63a1/html5/thumbnails/62.jpg)
2013 © Trivadis
Cassandra Data Types
January 2016Architecture et modèle de données Cassandra
62
Category
CQL Data Type
Description
String ascii US-ASCII character stringtext UTF-8 encoded string, used most of the
time for storing String data.varchar UTF-8 Strings.inet Used for storing IP addresses
Numeric int 32-bit signed integerfloat 32-bit IEEE-754 floating pointdouble 64-bit IEEE-754 floating pointvarint Arbitrary precision integersbigint 64-bit number, equivalent to long.decimal Variable-precision decimalcounter Distributed counter value (64-bit long)
![Page 63: Introduction to Cassandra and datastax DSE](https://reader035.fdocuments.us/reader035/viewer/2022081419/58887b111a28ab34788b63a1/html5/thumbnails/63.jpg)
2013 © Trivadis
Cassandra Data Types (II)
January 2016Architecture et modèle de données Cassandra
63
Category CQL Data Type
Description
UUIDs uuid A UUID in standard UUID formattimeuuid Type 1 UUID only, for storing unique time-
base IDsCollections list Ordered collection of one or more elements
map Collection of arbitrary key-value pairsset Unordered collection of one or more unique
elementsMiscellaneous
boolean Boolean (true/false)
blob Used for storing binary data written in hexadecimal
timestamp Date/Time
![Page 64: Introduction to Cassandra and datastax DSE](https://reader035.fdocuments.us/reader035/viewer/2022081419/58887b111a28ab34788b63a1/html5/thumbnails/64.jpg)
2013 © Trivadis
Cassandra Data Types (III)
TimeUUID• Have a few extra functions, that allow extracting the time information• now() returns a new TimeUUID with the time of the current timestamp,
ensures globally unique values• minTimeuuid() and maxTimeuuid() are used when querying ranges of
TimeUUIDs
Counter• Cannot mix counter columns with other types• Value can not be set, only incremented/decremented by specified amount• Counters may not be part of the PRIMARY KEY of the table
January 2016Architecture et modèle de données Cassandra
64
WHERE event_time > maxTimeuuid('2013-01-01 00:05+0000') AND event_time < minTimeuuid('2013-02-02 10:00+0000')
![Page 65: Introduction to Cassandra and datastax DSE](https://reader035.fdocuments.us/reader035/viewer/2022081419/58887b111a28ab34788b63a1/html5/thumbnails/65.jpg)
2013 © Trivadis
Collections
CQL3 also supports collections for storing complex data structures• Set {value,…}, List [value,…], Map {key:value,…}
January 2016Architecture et modèle de données Cassandra
65
cqlsh:training> CREATE TABLE collection_sample(id int PRIMARY KEY,
string_set set<text>,string_list list<text>,string_map map<text, text>);
cqlsh:training> INSERT INTO coll (id, string_set, string_list, string_map) VALUES (1, {'text1','text2','text1'}, ['text1','text2','text1'], {'key1':'value1'});
![Page 66: Introduction to Cassandra and datastax DSE](https://reader035.fdocuments.us/reader035/viewer/2022081419/58887b111a28ab34788b63a1/html5/thumbnails/66.jpg)
2013 © Trivadis
Collections (II)
January 2016Architecture et modèle de données Cassandra
66
cqlsh:training> SELECT * FROM collection_sample;
id | string_list | string_map | string_set----+-----------------------------+--------------------+-------------------- 1 | ['text1', 'text2', 'text1'] | {'key1': 'value1'} | {'text1', 'text2'}
(1 rows)
![Page 67: Introduction to Cassandra and datastax DSE](https://reader035.fdocuments.us/reader035/viewer/2022081419/58887b111a28ab34788b63a1/html5/thumbnails/67.jpg)
2013 © Trivadis
Counter Columns
Create a Counter Column Table that counts “favorite” events
January 2016Architecture et modèle de données Cassandra
67
cqlsh:training> CREATE TABLE favorites (product_id int,month int,number COUNTER,PRIMARY KEY (product_id, month));
cqlsh:training> UPDATE favorites SET number = number + 1 WHERE product_id = 4910 AND month = 06;
cqlsh:training> SELECT * FROM favorites;
product_id | month | number------------+-------+-------- 4910 | 6 | 1
![Page 68: Introduction to Cassandra and datastax DSE](https://reader035.fdocuments.us/reader035/viewer/2022081419/58887b111a28ab34788b63a1/html5/thumbnails/68.jpg)
2013 © Trivadis
Time-to-Live (TTL) on Insert
Insert a row with a TTL in seconds (30s) – after that the row is deleted
January 2016Architecture et modèle de données Cassandra
68
cqlsh:training> INSERT INTO employee (name, age, role) VALUES ('bob', 29, 'dev') USING TTL 30;
cqlsh:training> SELECT TTL(role) FROM employee WHERE name='bob'; ttl(role)----------- 22
cqlsh:training> SELECT TTL(role) FROM employee WHERE name='bob';
(0 rows)
![Page 69: Introduction to Cassandra and datastax DSE](https://reader035.fdocuments.us/reader035/viewer/2022081419/58887b111a28ab34788b63a1/html5/thumbnails/69.jpg)
2013 © Trivadis
Agenda
1. Introduction to NoSQL datastores and Polyglot Persistence
2. What is Apache Cassandra?
3. Why Cassandra, What is DataStax?
4. Cassandra Architecture
5. Cassandra Data Model
6. Cassandra Query Language (CQL)
7. Cassandra/DataStax @ Trivadis
January 2016Architecture et modèle de données Cassandra
69
![Page 70: Introduction to Cassandra and datastax DSE](https://reader035.fdocuments.us/reader035/viewer/2022081419/58887b111a28ab34788b63a1/html5/thumbnails/70.jpg)
2013 © Trivadis
Trivadis / DataStax Partnership
• Since December 2014 we are a DataStax silver partner
• DataStax Partner Network (DSPN)
• Available certifications• Admin• Developer• Architect
• Currently only one other partner in Switzerland: Intersys
• http://www.datastax.com/partners
January 2016Architecture et modèle de données Cassandra
70
![Page 71: Introduction to Cassandra and datastax DSE](https://reader035.fdocuments.us/reader035/viewer/2022081419/58887b111a28ab34788b63a1/html5/thumbnails/71.jpg)
2013 © Trivadis
Questions and answers ...
2013 © Trivadis
BASEL BERN BRUGES LAUSANNE ZÜRICH DÜSSELDORF FRANKFURT A.M. FREIBURG I.BR. HAMBURG MUNICH STUTTGART VIENNA
Ulises Fasoli
Senior consultant
+41 21 321 47 00
January 2016Architecture et modèle de données Cassandra
71