Spark and Cassandra: An Amazing Apache Love Story by Patrick McFadin

@PatrickMcFadin

Patrick McFadinChief Evangelist, DataStax

Spark and Cassandra: An amazing Apache love story

Store a ton of data Analyze a ton of data

Community Response?

CassandraOnly DC

Cassandra+ Spark DC

Spark Jobs

CassandraOnly DC

Cassandra+ Spark DC

Spark Jobs

Spark Streaming

Worker

Worker Worker

Analytics WorkloadTransactional Workload

DataStax Enterprise

• 10T of high frequency event data daily•Constant increasing volume

“The web server that powers the interface can query both datacenters, depending on which the user is closest to,”

“A small set of signals tend to double every eight months. So we needed a model that can scale linearly.”

- Arun Jayandra, Microsoft

RESTAPI

EventHub

IngestionWorker

(AzureworkerroleusingDataStax C#

driver)

C* Analytics

RESTAPI

KafkaC*/Spark

StreamingAnalytics

G4– LocalSSD

Kafka:G4– DataDiskZooKeeper:A7– DataDisk

PaaSSmall

G4– LocalSSD

Cluster1:

Cluster2:

20k – 50k events/sec

200k+ events/sec

Data Protection•Maximilian Schrems v Data Protection Commissioner•No longer OK to ship EU data to US under “Safe

Harbour”

Product_Catalog RF=3Product_Catalog RF=3 Customer_Data RF=3Customer_Data RF=0

Product_Catalog RF=3Customer_Data RF=3

• 300k customers•Report on energy usage• Predict boiler failure

“We’re dealing largely with time series data, and Spark is 10 to 100 times quicker as it is operating on data in-memory…Cassandra delivers what we need today and if you look at the Internet of Things space; that is what is really useful right now.” - Jim Anning, British Gas

Hive Active Heating™

CassandraOnly DC

Cassandra+ Spark DC

Spark Jobs

Spark Streaming

Home Data Center

Hive Active Heating™

Store a ton of data Analyze a ton of data

Thank you!

Spark and Cassandra: An Amazing Apache Love Story by Patrick McFadin

Data & Analytics

Transcript of Spark and Cassandra: An Amazing Apache Love Story by Patrick McFadin

Pycon 2012 Apache Cassandra

Apache Cassandra™ Documentationcourses.physics.illinois.edu/cs425/fa2017/cassandra10.pdfApache Cassandra 1.0 Documentation Introduction to Apache Cassandra Apache Cassandra is a

NOSQL Database: Apache Cassandra

Introduction to Cassandra • Why Spark - Apache Cassandra | Apache Kafka | Apache Spark · 2017. 12. 20. · • Introduction to Cassandra • Why Spark + Cassandra • Problem background

Introduction to Apache Cassandra - DataStax - · PDF fileIntroduction to Apache Cassandra . 2" ... Apache Cassandra™ is a massively scalable NoSQL database. Cassandra’s technical

Apache Cassandra and Go

KillrVideo: Data Modeling Evolved (Patrick McFadin, Datastax) | Cassandra Summit 2016

Amazon Managed Apache Cassandra Service - Developer GuideCassandra Query Language (CQL) is the primary language for communicating with Apache Cassandra. Amazon Managed Apache Cassandra

Camunda and Apache Cassandra

Talk About Apache Cassandra

Cassandra Day Denver 2014: Introduction to Apache Cassandra

Apache Cassandra 2.0

About "Apache Cassandra"

Presentation of Apache Cassandra

Apache Cassandra overview

Support Apache Cassandra in Production · Anuj Wadehra . Architect & Cassandra SME . Ericsson R & D . Support APACHE Cassandra in Production

Introduciton to Apache Cassandra

Spectator to Participant. Contributing to Cassandra (Patrick McFadin, DataStax) | C* Summit 2016

Hey Relational Developer, Let's Go Crazy (Patrick McFadin, DataStax) | Cassandra Summit 2016

Apache Cassandra at Macys