Post on 16-Apr-2017
©2013 DataStax Confidential. Do not distribute without consent.
@PatrickMcFadin
Patrick McFadinChief Evangelist, DataStax
Spark and Cassandra: An amazing Apache love story
1
Store a ton of data Analyze a ton of data
Community Response?
CassandraOnly DC
CassandraOnly DC
Cassandra+ Spark DC
Spark Jobs
CassandraOnly DC
Cassandra+ Spark DC
Spark Jobs
Spark Streaming
Worker
Worker
Worker Worker
Analytics WorkloadTransactional Workload
DataStax Enterprise
DataStax Enterprise
• 10T of high frequency event data daily•Constant increasing volume
“The web server that powers the interface can query both datacenters, depending on which the user is closest to,”
“A small set of signals tend to double every eight months. So we needed a model that can scale linearly.”
- Arun Jayandra, Microsoft
RESTAPI
O365
EventHub
IngestionWorker
(AzureworkerroleusingDataStax C#
driver)
C* Analytics
RESTAPI
O365
KafkaC*/Spark
StreamingAnalytics
G4– LocalSSD
Kafka:G4– DataDiskZooKeeper:A7– DataDisk
PaaSSmall
G4– LocalSSD
Cluster1:
Cluster2:
20k – 50k events/sec
200k+ events/sec
Data Protection•Maximilian Schrems v Data Protection Commissioner•No longer OK to ship EU data to US under “Safe
Harbour”
Product_Catalog RF=3Product_Catalog RF=3 Customer_Data RF=3Customer_Data RF=0
Product_Catalog RF=3Customer_Data RF=3
• 300k customers•Report on energy usage• Predict boiler failure
“We’re dealing largely with time series data, and Spark is 10 to 100 times quicker as it is operating on data in-memory…Cassandra delivers what we need today and if you look at the Internet of Things space; that is what is really useful right now.” - Jim Anning, British Gas
Hive Active Heating™
CassandraOnly DC
Cassandra+ Spark DC
Spark Jobs
Spark Streaming
Home Data Center
Hive Active Heating™
Store a ton of data Analyze a ton of data
Thank you!