DataStax | DataStax Tools for Developers (Alex Popescu) | Cassandra Summit 2016
Apache Cassandra and DataStax - Meetupfiles.meetup.com/3139382/ApacheCassandraAndDatastax.pdf ·...
Transcript of Apache Cassandra and DataStax - Meetupfiles.meetup.com/3139382/ApacheCassandraAndDatastax.pdf ·...
![Page 1: Apache Cassandra and DataStax - Meetupfiles.meetup.com/3139382/ApacheCassandraAndDatastax.pdf · Cassandra - Node Recovery 15 • When a write is performed and a replica node for](https://reader031.fdocuments.us/reader031/viewer/2022040320/5e485958e43ce35ee0749f8a/html5/thumbnails/1.jpg)
DataStax EMEA
Apache Cassandra and DataStax
![Page 2: Apache Cassandra and DataStax - Meetupfiles.meetup.com/3139382/ApacheCassandraAndDatastax.pdf · Cassandra - Node Recovery 15 • When a write is performed and a replica node for](https://reader031.fdocuments.us/reader031/viewer/2022040320/5e485958e43ce35ee0749f8a/html5/thumbnails/2.jpg)
Agenda
2
1. Apache Cassandra2. Cassandra Query Language3. DataStax Enterprise4. Realtime Analytics5. What´s New
![Page 3: Apache Cassandra and DataStax - Meetupfiles.meetup.com/3139382/ApacheCassandraAndDatastax.pdf · Cassandra - Node Recovery 15 • When a write is performed and a replica node for](https://reader031.fdocuments.us/reader031/viewer/2022040320/5e485958e43ce35ee0749f8a/html5/thumbnails/3.jpg)
About me
3
Christian JohannsenSolutions Engineer @ DataStax
@cjohannsen81
![Page 4: Apache Cassandra and DataStax - Meetupfiles.meetup.com/3139382/ApacheCassandraAndDatastax.pdf · Cassandra - Node Recovery 15 • When a write is performed and a replica node for](https://reader031.fdocuments.us/reader031/viewer/2022040320/5e485958e43ce35ee0749f8a/html5/thumbnails/4.jpg)
Apache Cassandra
4
![Page 5: Apache Cassandra and DataStax - Meetupfiles.meetup.com/3139382/ApacheCassandraAndDatastax.pdf · Cassandra - Node Recovery 15 • When a write is performed and a replica node for](https://reader031.fdocuments.us/reader031/viewer/2022040320/5e485958e43ce35ee0749f8a/html5/thumbnails/5.jpg)
What is Apache Cassandra
5
• Apache Cassandra is a distributed (“NoSQL”) database• massively scalable and available • providing extreme performance• designed to handle big data workloads • across multiple data center • no single point of failure
Dynamo
BigTable
BigTable: http://research.google.com/archive/bigtable-osdi06.pdf Dynamo: http://www.allthingsdistributed.com/files/amazon-dynamo-sosp2007.pdf
![Page 6: Apache Cassandra and DataStax - Meetupfiles.meetup.com/3139382/ApacheCassandraAndDatastax.pdf · Cassandra - Node Recovery 15 • When a write is performed and a replica node for](https://reader031.fdocuments.us/reader031/viewer/2022040320/5e485958e43ce35ee0749f8a/html5/thumbnails/6.jpg)
Cassandra Architecture
6
DATA STORE (BIG TABLES)
CLUSTER (DYNAMO)
API (CQL & RPC)
DISKS
Node1
Client request
Node2
CLUSTER (DYNAMO)
API (CQL & RPC)
DISKS
DATA STORE (BIG TABLES)
![Page 7: Apache Cassandra and DataStax - Meetupfiles.meetup.com/3139382/ApacheCassandraAndDatastax.pdf · Cassandra - Node Recovery 15 • When a write is performed and a replica node for](https://reader031.fdocuments.us/reader031/viewer/2022040320/5e485958e43ce35ee0749f8a/html5/thumbnails/7.jpg)
Scale, anybody?
7
• Cassandra works from very small to very large deployments • Cassandra footprint @ Apple* • 75,000+ nodes • 10’s of petabytes of data • Millions ops/second • Largest cluster 1000+ nodes
*Cassandra Summit, USA, September 2014
![Page 8: Apache Cassandra and DataStax - Meetupfiles.meetup.com/3139382/ApacheCassandraAndDatastax.pdf · Cassandra - Node Recovery 15 • When a write is performed and a replica node for](https://reader031.fdocuments.us/reader031/viewer/2022040320/5e485958e43ce35ee0749f8a/html5/thumbnails/8.jpg)
Why Cassandra?
8
• Masterless Architecture with read/write anywhere design• Continuous Availability with no single point of failure• Multi-Data Center and Availability Zone support• Flexible data model for unstructured, semi-structured and
structured data• Linear scalable performance with online expansion (scale-out and
scale-up)• Security with integrated authentication• Operationally simple• CQL - Cassandra Query Language
![Page 10: Apache Cassandra and DataStax - Meetupfiles.meetup.com/3139382/ApacheCassandraAndDatastax.pdf · Cassandra - Node Recovery 15 • When a write is performed and a replica node for](https://reader031.fdocuments.us/reader031/viewer/2022040320/5e485958e43ce35ee0749f8a/html5/thumbnails/10.jpg)
• Client reads or writes to any node• Node coordinates with others (gossip
protocol)• Data read or replicated in parallel• RF = 3 in this example• Each node is strong 60% of the clusters
Data i.e. 3/5
Cassandra - Locally Distributed
10
Node 1 1st copy
Node 4
Node 5
Node 2 2nd copy
Node 3 3rd copy
Node 2 2nd copy
![Page 11: Apache Cassandra and DataStax - Meetupfiles.meetup.com/3139382/ApacheCassandraAndDatastax.pdf · Cassandra - Node Recovery 15 • When a write is performed and a replica node for](https://reader031.fdocuments.us/reader031/viewer/2022040320/5e485958e43ce35ee0749f8a/html5/thumbnails/11.jpg)
Cassandra - Rack/Zone aware
11
Node 1 1st copy
Node 4
Node 2
Node 3 2nd copy
Rack 1
Rack 2Rack 2
Rack 3
Rack 1
Node 5 3rd copy
• Cassandra is aware of which rack or zone each node resides in
• It will attempt to place each data copy in a different rack
• RF=3 in this example
![Page 12: Apache Cassandra and DataStax - Meetupfiles.meetup.com/3139382/ApacheCassandraAndDatastax.pdf · Cassandra - Node Recovery 15 • When a write is performed and a replica node for](https://reader031.fdocuments.us/reader031/viewer/2022040320/5e485958e43ce35ee0749f8a/html5/thumbnails/12.jpg)
Cassandra - DC/Region aware
12
• Active Everywhere – reads/writes in multiple data centres • Client writes local • Data syncs across WAN • Replication Factor per DC • Different number of nodes per
data center Node 1 1st copy
Node 4
Node 5 Node 2 2nd copy
Node 3 3rd copy
Node 1 1st copy
Node 4
Node 5 Node 2 2nd copy
Node 3 3rd copy
DC: EUROPEDC: USA
![Page 13: Apache Cassandra and DataStax - Meetupfiles.meetup.com/3139382/ApacheCassandraAndDatastax.pdf · Cassandra - Node Recovery 15 • When a write is performed and a replica node for](https://reader031.fdocuments.us/reader031/viewer/2022040320/5e485958e43ce35ee0749f8a/html5/thumbnails/13.jpg)
Cassandra - Tuneable Consistency
13
• Consistency Level (CL) • Client specifies per operation • Handles multi-data center operations
• ALL = All replicas ack • QUORUM = > 51% of replicas ack • LOCAL_QUORUM = > 51% in local DC ack • ONE = Only one replica acks • Plus more…. (see docs)
• Blog: Eventual Consistency != Hopeful Consistencyhttp://planetcassandra.org/blog/post/a-netflix-experiment-eventual-consistency-hopeful-consistency-by-christos-kalantzis/
Node 1 1st copy
Node 4
Node 5 Node 2 2nd copy
Node 3 3rd copy
ParallelWrite
WriteCL=QUORUM
5 μs ack
12 μs ack
500 μs ack
12 μs ack
![Page 14: Apache Cassandra and DataStax - Meetupfiles.meetup.com/3139382/ApacheCassandraAndDatastax.pdf · Cassandra - Node Recovery 15 • When a write is performed and a replica node for](https://reader031.fdocuments.us/reader031/viewer/2022040320/5e485958e43ce35ee0749f8a/html5/thumbnails/14.jpg)
Cassandra - Node failure
14
• A single node failure shouldn’t bring failure. • Replication Factor + Consistency Level = Success
• This example: • RF = 3 • CL = QUORUM
Node 1 1st copy
Node 4
Node 5 Node 2 2nd copy
Node 3 3rd copy
ParallelWrite
WriteCL=QUORUM
5 μs ack
12 μs ack
12 μs ack
>51% ack – so request is a success
![Page 15: Apache Cassandra and DataStax - Meetupfiles.meetup.com/3139382/ApacheCassandraAndDatastax.pdf · Cassandra - Node Recovery 15 • When a write is performed and a replica node for](https://reader031.fdocuments.us/reader031/viewer/2022040320/5e485958e43ce35ee0749f8a/html5/thumbnails/15.jpg)
Cassandra - Node Recovery
15
• When a write is performed and a replica node for the row is unavailable the coordinator will store a hint locally (3 hours)
• When the node recovers, the coordinator replays the missed writes. • Note: a hinted write does not count the consistency level • Note: you should still run repairs across your cluster
Node 1 1st copy
Node 4
Node 5 Node 2 2nd copy
Node 3 3rd copy
Stores Hints while Node 3 is offline
![Page 16: Apache Cassandra and DataStax - Meetupfiles.meetup.com/3139382/ApacheCassandraAndDatastax.pdf · Cassandra - Node Recovery 15 • When a write is performed and a replica node for](https://reader031.fdocuments.us/reader031/viewer/2022040320/5e485958e43ce35ee0749f8a/html5/thumbnails/16.jpg)
Cassandra Rack/Zone Failure
16
• Cassandra will place the data in as many different racks or availability zones as it can.
• This example: • RF = 3 • CL = QUORUM • AZ/Rack 2 fails
• Data copies still available in Node 1 and Node 5
• Quorum can be honored i.e. > 51% ack
Node 1 1st copy
Node 4
Node 2
Node 3 2nd copy
Rack 1
Rack 2Rack 2
Rack 3
Rack 1
Node 5 3rd copy
request is a success
![Page 17: Apache Cassandra and DataStax - Meetupfiles.meetup.com/3139382/ApacheCassandraAndDatastax.pdf · Cassandra - Node Recovery 15 • When a write is performed and a replica node for](https://reader031.fdocuments.us/reader031/viewer/2022040320/5e485958e43ce35ee0749f8a/html5/thumbnails/17.jpg)
Cassandra is fast!
17
• University of Toronto study:
http://vldb.org/pvldb/vol5/p1724_tilmannrabl_vldb2012.pdf
![Page 18: Apache Cassandra and DataStax - Meetupfiles.meetup.com/3139382/ApacheCassandraAndDatastax.pdf · Cassandra - Node Recovery 15 • When a write is performed and a replica node for](https://reader031.fdocuments.us/reader031/viewer/2022040320/5e485958e43ce35ee0749f8a/html5/thumbnails/18.jpg)
Why is Cassandra so fast?
18
• write-optimised - sequential writes to disk
• fast merging - when SSTable big enough merged with existing
• single layout on disk
![Page 19: Apache Cassandra and DataStax - Meetupfiles.meetup.com/3139382/ApacheCassandraAndDatastax.pdf · Cassandra - Node Recovery 15 • When a write is performed and a replica node for](https://reader031.fdocuments.us/reader031/viewer/2022040320/5e485958e43ce35ee0749f8a/html5/thumbnails/19.jpg)
Cassandra Query Language
19
![Page 20: Apache Cassandra and DataStax - Meetupfiles.meetup.com/3139382/ApacheCassandraAndDatastax.pdf · Cassandra - Node Recovery 15 • When a write is performed and a replica node for](https://reader031.fdocuments.us/reader031/viewer/2022040320/5e485958e43ce35ee0749f8a/html5/thumbnails/20.jpg)
CQL
20
• Cassandra Query Language
• CQL is intended to provide a common, simpler and easier to use interface into Cassandra - and you probably already know it!
• e.g. SELECT * FROM users
• Usual statements: • CREATE / DROP / ALTER TABLE / SELECT
![Page 21: Apache Cassandra and DataStax - Meetupfiles.meetup.com/3139382/ApacheCassandraAndDatastax.pdf · Cassandra - Node Recovery 15 • When a write is performed and a replica node for](https://reader031.fdocuments.us/reader031/viewer/2022040320/5e485958e43ce35ee0749f8a/html5/thumbnails/21.jpg)
CQLSH
21
• Command line interface comes with Cassandra• Allows some other Statements
Command DescriptionCAPTURE Captures command output and appends it to a fileCONSISTENCY Shows the current consistency level, or given a level, sets itCOPY Imports and exports CSV (comma-separated values) dataDESCRIBE Provides information about a Cassandra cluster or data objectsEXIT Terminates cqlshSHOW Shows the Cassandra version, host, or data type assumptionsSOURCE Executes a file containing CQL statementsTRACING Enables or disables request tracing
![Page 22: Apache Cassandra and DataStax - Meetupfiles.meetup.com/3139382/ApacheCassandraAndDatastax.pdf · Cassandra - Node Recovery 15 • When a write is performed and a replica node for](https://reader031.fdocuments.us/reader031/viewer/2022040320/5e485958e43ce35ee0749f8a/html5/thumbnails/22.jpg)
CQL Basics
22
CREATE KEYSPACE league WITH REPLICATION = {‘class’:’NetworkTopologyStrategy’, ‘DataCentre1’:3, ‘DataCentre2’: 2};
USE league;
CREATE TABLE teams ( team_name varchar, player_name varchar, jersey int, PRIMARY KEY (team_name, player_name));SELECT * FROM teams WHERE team_name = ‘Mighty Mutts’ and player_name = ‘Lucky’;
INSERT INTO teams (team_name, player_name, jersey) VALUES ('Mighty Mutts',’Felix’,90);
![Page 23: Apache Cassandra and DataStax - Meetupfiles.meetup.com/3139382/ApacheCassandraAndDatastax.pdf · Cassandra - Node Recovery 15 • When a write is performed and a replica node for](https://reader031.fdocuments.us/reader031/viewer/2022040320/5e485958e43ce35ee0749f8a/html5/thumbnails/23.jpg)
CQL Data Types
23
![Page 24: Apache Cassandra and DataStax - Meetupfiles.meetup.com/3139382/ApacheCassandraAndDatastax.pdf · Cassandra - Node Recovery 15 • When a write is performed and a replica node for](https://reader031.fdocuments.us/reader031/viewer/2022040320/5e485958e43ce35ee0749f8a/html5/thumbnails/24.jpg)
What ´s up with DataStax?
24
![Page 25: Apache Cassandra and DataStax - Meetupfiles.meetup.com/3139382/ApacheCassandraAndDatastax.pdf · Cassandra - Node Recovery 15 • When a write is performed and a replica node for](https://reader031.fdocuments.us/reader031/viewer/2022040320/5e485958e43ce35ee0749f8a/html5/thumbnails/25.jpg)
DataStax at a glance
25
Founded in April 2010
~27 600+
Santa Clara, Austin, New York, London, Sydney, Paris
370+Employees Percent Customers
![Page 26: Apache Cassandra and DataStax - Meetupfiles.meetup.com/3139382/ApacheCassandraAndDatastax.pdf · Cassandra - Node Recovery 15 • When a write is performed and a replica node for](https://reader031.fdocuments.us/reader031/viewer/2022040320/5e485958e43ce35ee0749f8a/html5/thumbnails/26.jpg)
26
Certified,Enterprise-ready
Cassandra
Security Analytics Search Visual Monitoring
Management Services In-Memory
Dev. IDE & Drivers
Professional Services
Support & Training
Commercial Confidence
Enterprise Functionality
DataStax adds value
![Page 27: Apache Cassandra and DataStax - Meetupfiles.meetup.com/3139382/ApacheCassandraAndDatastax.pdf · Cassandra - Node Recovery 15 • When a write is performed and a replica node for](https://reader031.fdocuments.us/reader031/viewer/2022040320/5e485958e43ce35ee0749f8a/html5/thumbnails/27.jpg)
DataStax Analytics
27
• Designed for running analytics on Cassandra data
• There are 4 ways to do Analytics on Cassandra data:1. Integrated Search (Solr)2. Integrated Batch Analytics (MapReduce, Hive, Pig, Mahout) on
Cassandra3. External Batch Analytics (Hadoop; certified with Cloudera,
HortonWorks)4. Integrated Near Real-Time Analytics (Spark)
![Page 28: Apache Cassandra and DataStax - Meetupfiles.meetup.com/3139382/ApacheCassandraAndDatastax.pdf · Cassandra - Node Recovery 15 • When a write is performed and a replica node for](https://reader031.fdocuments.us/reader031/viewer/2022040320/5e485958e43ce35ee0749f8a/html5/thumbnails/28.jpg)
OpsCenter 5.0
28
• Manage multiple clusters and nodes • Add and remove nodes • Administer individual nodes or in bulk • Configure clusters • Perform rolling restarts • Automatically repair data • Rebalance data • Backup management • Capacity planning
![Page 29: Apache Cassandra and DataStax - Meetupfiles.meetup.com/3139382/ApacheCassandraAndDatastax.pdf · Cassandra - Node Recovery 15 • When a write is performed and a replica node for](https://reader031.fdocuments.us/reader031/viewer/2022040320/5e485958e43ce35ee0749f8a/html5/thumbnails/29.jpg)
OpsCenter - New Cluster Example
29
A new, 10-node Cassandra (or Hadoop) cluster with OpsCenter running in 3 minutes… A new, 10-node DSE cluster with OpsCenter running on AWS in 3 minutes…
Done1 2 3
![Page 30: Apache Cassandra and DataStax - Meetupfiles.meetup.com/3139382/ApacheCassandraAndDatastax.pdf · Cassandra - Node Recovery 15 • When a write is performed and a replica node for](https://reader031.fdocuments.us/reader031/viewer/2022040320/5e485958e43ce35ee0749f8a/html5/thumbnails/30.jpg)
DevCenter 1.1
30
• Visual Query Tool for Developers and Administrators • Easily create and run Cassandra Queries • Visually navigate database objects • Context-based suggestions
![Page 31: Apache Cassandra and DataStax - Meetupfiles.meetup.com/3139382/ApacheCassandraAndDatastax.pdf · Cassandra - Node Recovery 15 • When a write is performed and a replica node for](https://reader031.fdocuments.us/reader031/viewer/2022040320/5e485958e43ce35ee0749f8a/html5/thumbnails/31.jpg)
DataStax Office Demo
31
• 32 Raspberry Pi´s • 16 per DataStax Enterprise 4.5 Cluster • Managed in OpsCenter 5.0 • “Red Button” downs one DataCenter • Not the Performance-Demo but
• Availability • Commodity Hardware
![Page 32: Apache Cassandra and DataStax - Meetupfiles.meetup.com/3139382/ApacheCassandraAndDatastax.pdf · Cassandra - Node Recovery 15 • When a write is performed and a replica node for](https://reader031.fdocuments.us/reader031/viewer/2022040320/5e485958e43ce35ee0749f8a/html5/thumbnails/32.jpg)
Native Drivers
32
• Different Native Drivers available: Java, Python etc. • Load Balancing Policies (Client Driver receives Updates) • Data Centre Aware • Latency Aware • Token Aware
• Reconnection policies • Retry policies • Downgrading Consistency • Plus others..
• http://www.datastax.com/download/clientdrivers
![Page 33: Apache Cassandra and DataStax - Meetupfiles.meetup.com/3139382/ApacheCassandraAndDatastax.pdf · Cassandra - Node Recovery 15 • When a write is performed and a replica node for](https://reader031.fdocuments.us/reader031/viewer/2022040320/5e485958e43ce35ee0749f8a/html5/thumbnails/33.jpg)
DataStax Enterprise
33
Feature Open Source Datastax EnterpriseDatabase SoftwareData Platform Latest Community Cassandra Production Certified CassandraCore security features Yes YesEnterprise security features No YesBuilt-in automatic management services No YesIntegrated analytics No YesIntegrated enterprise search No YesWorkload/Workflow Isolation No YesEasy migration of RDBMS and log data No YesCertified Service Packs No YesCertified platform support No YesManagement SoftwareOpsCenter Basic functionality Advanced functionalityServicesCommunity Support Yes YesDatastax 24x7x365 Support No YesQuarterly Performance Reviews No YesHot Fixes No YesBug Escalation Privilege No YesCustom Builds No OptionEOL Support No YesLicensing Free Subscription
![Page 34: Apache Cassandra and DataStax - Meetupfiles.meetup.com/3139382/ApacheCassandraAndDatastax.pdf · Cassandra - Node Recovery 15 • When a write is performed and a replica node for](https://reader031.fdocuments.us/reader031/viewer/2022040320/5e485958e43ce35ee0749f8a/html5/thumbnails/34.jpg)
DataStax Comparison
34
Standard Pro MaxServer Data Management ComponentsProduction-certified Cassandra Yes Yes YesAdvanced security option Yes Yes YesRepair service Yes Yes YesCapacity planning service Yes Yes YesEnterprise search (built-in Solr) No Yes YesAnalytics (built-in Hadoop) No No YesManagement ToolsOpsCenter Enterprise Yes Yes YesSupport ServicesExpert Support 24x7x1 24x7x1 24x7x1Partner Development Support Business
hoursBusiness hours Business
hoursCertified service packs Yes Yes YesHot fixes Yes Yes YesBug escalation Yes Yes YesQuarterly performance reviews No No YesBi-weekly call with support team No No YesCustom builds No No Option
![Page 35: Apache Cassandra and DataStax - Meetupfiles.meetup.com/3139382/ApacheCassandraAndDatastax.pdf · Cassandra - Node Recovery 15 • When a write is performed and a replica node for](https://reader031.fdocuments.us/reader031/viewer/2022040320/5e485958e43ce35ee0749f8a/html5/thumbnails/35.jpg)
© 2014 DataStax Confidential. Do not distribute without consent.
Netflix Delights Customers with Personal RecommendationsWorld’s leading streaming media provider with digital revenue $1.5BN+Tailors content delivery based on viewing preference data captured in CassandraIncreased market cap by 600% since 2012Introduction of ‘Profiles’ drove throughput to over 10M transactions per secondReplaced Oracle in six data centers, worldwide, 100% in the cloud
Use Case: Personalization35
![Page 36: Apache Cassandra and DataStax - Meetupfiles.meetup.com/3139382/ApacheCassandraAndDatastax.pdf · Cassandra - Node Recovery 15 • When a write is performed and a replica node for](https://reader031.fdocuments.us/reader031/viewer/2022040320/5e485958e43ce35ee0749f8a/html5/thumbnails/36.jpg)
36
![Page 37: Apache Cassandra and DataStax - Meetupfiles.meetup.com/3139382/ApacheCassandraAndDatastax.pdf · Cassandra - Node Recovery 15 • When a write is performed and a replica node for](https://reader031.fdocuments.us/reader031/viewer/2022040320/5e485958e43ce35ee0749f8a/html5/thumbnails/37.jpg)
Realtime Analytics
37
![Page 38: Apache Cassandra and DataStax - Meetupfiles.meetup.com/3139382/ApacheCassandraAndDatastax.pdf · Cassandra - Node Recovery 15 • When a write is performed and a replica node for](https://reader031.fdocuments.us/reader031/viewer/2022040320/5e485958e43ce35ee0749f8a/html5/thumbnails/38.jpg)
What is Spark?
38
• Apache Project since 2010 - Analytics Framework • 10-100x faster than Hadoop MapReduce • In-Memory Storage for Read&Write data • Single JVM Processor per node • Rich Scala, Java and Python API´s • 2x-5x less code • Interactive Shell
![Page 39: Apache Cassandra and DataStax - Meetupfiles.meetup.com/3139382/ApacheCassandraAndDatastax.pdf · Cassandra - Node Recovery 15 • When a write is performed and a replica node for](https://reader031.fdocuments.us/reader031/viewer/2022040320/5e485958e43ce35ee0749f8a/html5/thumbnails/39.jpg)
Spark Architecture
39
![Page 40: Apache Cassandra and DataStax - Meetupfiles.meetup.com/3139382/ApacheCassandraAndDatastax.pdf · Cassandra - Node Recovery 15 • When a write is performed and a replica node for](https://reader031.fdocuments.us/reader031/viewer/2022040320/5e485958e43ce35ee0749f8a/html5/thumbnails/40.jpg)
Why Spark on Cassandra?
40
• Data model independent queries • cross-table operations (JOIN, UNION, etc.)! • complex analytics (e.g. machine learning) • data transformation, aggregation etc. • stream processing (coming soon) • all nodes are Spark workers • by default resilient to worker failures • first node promoted as Spark Master • Standby Master promoted on failure • Master HA available in Datastax Enterprise
![Page 41: Apache Cassandra and DataStax - Meetupfiles.meetup.com/3139382/ApacheCassandraAndDatastax.pdf · Cassandra - Node Recovery 15 • When a write is performed and a replica node for](https://reader031.fdocuments.us/reader031/viewer/2022040320/5e485958e43ce35ee0749f8a/html5/thumbnails/41.jpg)
How to Spark on Cassandra?
41
• DataStax Cassandra Spark driver • OpenSource: https://github.com/datastax/cassandra-driver-spark
• Compatible with • Spark 0.9+ • Cassandra 2.0+ • DataStax Enterprise 4.5+
![Page 42: Apache Cassandra and DataStax - Meetupfiles.meetup.com/3139382/ApacheCassandraAndDatastax.pdf · Cassandra - Node Recovery 15 • When a write is performed and a replica node for](https://reader031.fdocuments.us/reader031/viewer/2022040320/5e485958e43ce35ee0749f8a/html5/thumbnails/42.jpg)
What´s new?!
42
![Page 43: Apache Cassandra and DataStax - Meetupfiles.meetup.com/3139382/ApacheCassandraAndDatastax.pdf · Cassandra - Node Recovery 15 • When a write is performed and a replica node for](https://reader031.fdocuments.us/reader031/viewer/2022040320/5e485958e43ce35ee0749f8a/html5/thumbnails/43.jpg)
2.1 Release - User Defined Types
43
CREATE TYPE address ( street text, city text, zip_code int, phones set<text> )
CREATE TABLE users ( id uuid PRIMARY KEY, name text, addresses map<text, address> )
SELECT id, name, addresses.city, addresses.phones FROM users;
id | name | addresses.city | addresses.phones
--------------------+----------------+-------------------------- 63bf691f | chris | Berlin | {’0201234567', ’0796622222'}
![Page 44: Apache Cassandra and DataStax - Meetupfiles.meetup.com/3139382/ApacheCassandraAndDatastax.pdf · Cassandra - Node Recovery 15 • When a write is performed and a replica node for](https://reader031.fdocuments.us/reader031/viewer/2022040320/5e485958e43ce35ee0749f8a/html5/thumbnails/44.jpg)
2.1 Release - Secondary Indexes on collections
44
CREATE TABLE songs (
id uuid PRIMARY KEY,
artist text,
album text,
title text,
data blob,
tags set<text>
);
CREATE INDEX song_tags_idx ON songs(tags);
SELECT * FROM songs WHERE tags CONTAINS 'blues';
id | album | artist | tags | title
----------+---------------+-------------------+-----------------------+------------------
5027b27e | Country Blues | Lightnin' Hopkins | {'acoustic', 'blues'} | Worrying My Mind
![Page 45: Apache Cassandra and DataStax - Meetupfiles.meetup.com/3139382/ApacheCassandraAndDatastax.pdf · Cassandra - Node Recovery 15 • When a write is performed and a replica node for](https://reader031.fdocuments.us/reader031/viewer/2022040320/5e485958e43ce35ee0749f8a/html5/thumbnails/45.jpg)
How to start in production?
45
• DataStax Enterprise or Community• Hardware:
• min. 8GB RAM - optimal price-performance sweet spot is 16GB to 64GB• 8-Core CPU - Cassandra is so efficient in writing that the CPU is the
limiting factor• SSD-Disks - Commitlog + 50% Compaction and ext3/4 or xfs file-system
• Nodes - Cluster recommendation is 3 nodes as minimum
• Alternative: Use the Amazon Images (http://www.datastax.com/documentation/cassandra/2.0/cassandra/architecture/architecturePlanningEC2_c.html)
![Page 46: Apache Cassandra and DataStax - Meetupfiles.meetup.com/3139382/ApacheCassandraAndDatastax.pdf · Cassandra - Node Recovery 15 • When a write is performed and a replica node for](https://reader031.fdocuments.us/reader031/viewer/2022040320/5e485958e43ce35ee0749f8a/html5/thumbnails/46.jpg)
Thanks! Let´s see a demo!
46